This story uses a public dataset of , (American high school yearbook photos from the years – 2013, created and published by Shiry Ginosar, Kate Rakelly, Sarah Sachs, Brian Yin, and Alexei A. Efros. All faces are front-facing and aligned by eye position. Complete details about the creation of this dataset can be found in an article by the authors .
Image processing pipeline
In order to analyze hair in the yearbook dataset, we created a processing pipeline with three stages:
- Hair segmentation. Identify the pixels in a portrait that correspond to hair.
- Feature detection. Summarize important features of the hairstyle.
Identifying the hair in a portrait is an example of semantic segmentation , a challenging problem in computer science. In our case, the computer’s task is to accurately identify whether each pixel in an image is a hair pixel or not. We leveraged an existing approach from deep learning, called a U-Net , which has shown promising results in biomedical image segmentation (eg, identifying a tumor or lesion in a scan).
We adopted a popular U-Net architecture for this story ( code here ). To train the U-Net to segment hair, we required a labeled dataset containing images similar to the yearbook images. The closest existing dataset is the Figaro1K , which consists of 1, labeled images containing hair of many textures, colors, and styles. However, this dataset does not contain historical images, and differs considerably from yearbook photos in terms of image composition and possibly many other features, such as contrast, dynamic range, and sharpness. We expanded the training dataset by hand-segmenting over 400 pictures from a custom dataset of historical yearbook images from archive.org (because the segemntation model was trained before the author discovered the well-curated Ginosar et al. yearbook dataset used for this story! Kids, always try google a few different ways before making your own dataset). An augmented training set of Figaro1K images and the hand-labeled yearbook photos were used to train the U-Net initially. After training, test images that were successfully labeled by the U-Net were manually selected and added to the training set for another pass. This ultimately yielded a final model trained on 3, 90 images with segmentation masks.
The trained model was then used to generate hair maps for all images in the Ginosar et al. yearbook dataset.
Example images and hair maps:
There were several “failure modes” that proved challenging for the classifier.
At this point, it was clear that due to lighting and composition choices common to yearbook photos from the s and earlier, the U-Net especially struggled to segment hair in these images. Based on this challenge, we restricted the next stages of our analysis to images from the year and on. This left 42, (images, or roughly 4, (images per decade (on average).
In order to analyze changes in hairstyles, we needed a way to summarize hair features appropriately. After performing hair segmentation on each image in the dataset, we have a “hair map” that expresses the probability each pixel contains hair. Next, we used a deep learning approach called a Variational Autoencoder (VAE) to summarize the hair maps in four coordinates. A VAE is a neural network that learns to express features that vary maximally over the entire dataset, such as hair length (which varies from Morticia-Adams-long to military-buzz-cut-short) and height (from buzz to beehive). Because the VAE is trained on hair maps and not the original yearbook photos themselves, the results are less influenced by unrelated features such as skin color, facial features, or image grain. A downside of this approach is that hair maps may lose textural and color detail, so the VAE is largely insensitive to trends in hair color or subtle texture changes. The script to train the VAE is hosted on github .
The complete set of scripts for the analyzes conducted here are hosted on github , as well as the raw hairmaps and set of 4 features for each.
Mullet, beehive and straight hair analysis:
After training on a small subset of labeled images from the dataset, the classifiers were used to identify target hairstyles in the rest of the dataset. Manual inspection was used to confirm the machine’s selections (note that category boundaries can be challenging and at times subjective: the difference between a beehive and a bouffant is sometimes narrow). Once incidences of each target look were confirmed, the proportion of each look per year was calculated.
One estimation we do attempt to make is how much hair styles identified with male-vs. female- presenting portraits have diverged and overlapped over time. To do this, we use the 4-dimensional coordinates assigned to each hairmap as features for distinguishing looks belonging to each class (male- and female-tagged images). For every year from – , a random forest classifier was trained on the images from that year with 5-fold cross-validation. The results of these classifiers are plotted over time with Loess smoothing.
Hair size over time:
GIPHY App Key not set. Please check settings