Option 1: Programming Project - FeedForward-Backpropagation Neural Network with vs. without an Autoencoder Layer

From Hande Celikkanat

This project is about implementing a multi-layered feedforward-backpropagation network for classification. Your NN will be taking image files as input and performing a multi-class decision on them. In particular, you will be comparing a standard feedforward-backpropagation network with a network that is augmented with an autoencoder.

Observe what happens when you train networks of 0 hidden layer, 1 hidden layers, 2 hidden layers, and more. How does the performance of a large number of hidden layers (i.e., more "deep" networks) compare with that of more shallow ones? Why do you think that is?

An interesting solution to this problem is to train an "autoencoder" first, which will act as the first hidden layer. Initially train an autoencoder which takes as input an image, reduces it to some representation in its hidden layer, then outputs the exact same image. Then, use this trained network as your first hidden layer, and train the further layers with standard supervised backpropagation algorithm. Do you observe a performance difference?

For this project, you are expected to experiment on the differences between the "standard" version and the "autoencoder" version. In addition, please design and make your own experiments. Examples may be, but not restricted to: How does the number of layers effect performance? Do you observe strong improvement in performance from the transition to 0-hidden-layer case to 1-hidden layer case? Between 1-hidden-layer case and 2-hidden-layer case? Does the number of neurons in the hidden layer(s) effect the performance significantly? Is it better or worse to provide more hidden neurons than the number of neurons in the input layer? How do certain changes in the network structure reflect on the time performance, and is this compatible with your expectations given your knowledge of the structure of the networks?

Report on the effects. We encourage you to try to get a feeling of the strong and weak points of the respective approaches. Do you encounter any behavior that seems contrary to the literature? Why do you think that happens? You are welcome to report failures as well as successes, please do comment on the most possible reasons of failures.

Also experiment with the sensitivity of the network to the learning parameters. Note that the sensitivity can be especially strong in the autoencoder case, we encourage you to try different parameters in order to observe the difference in performance.

Possible datasets to choose from: (You are welcome to propose a dataset of your own choice.) Template code: <Links here> (Mandatory to use, please see below.)