What Is The Recommended Minimum Training Data Set Size To Train A Deep Neural Network?
Choosing an acceptable size of data set for your Deep Neural Network is important, but it’s hard to estimate what acceptable measures.Â
This is because, in DNN, each problem is different and even within the same type of problems, the data set size can vary.
For example, consider the problem of identifying a particular model from a pool of cars.Â
Possible data set variations
Black and White images only of the model you’re looking for
Colored images of only the model you’re looking for
Black and White images of cars of all modelsÂ
Colored images of all models of cars
It’s easy to guess that Black and White sets will be smaller than Coloured datasets but we still can’t say by how much.Â
At large, you can make an educated guess as to how much the data set size should be. You’ll find concepts during your Deep learning training that’ll help you make a guess. Let’s see a few examples.Â
Using Statistical Heuristics
Statistical Heuristic is an approach of estimating the data set size by using statistical parameters and a general understanding of the problem.
Every data set has classes in which the data is classified and each class has some statistical models that you can use.Â
For example, you can have a set number of items per statistic parameter. For exemplary purposes, you can use a multiple of the number of classes you have and let each class have at least “n” items.Â
So, if you have 100 classes and n = 1000, you need 100,000 data items in your set. You can use this as a starting/initial point.Â
Ultimately, your goal is to find enough data so that your DNN algorithm can function efficiently. So, we’ll use a bottom-up approach and check how well the algorithm performs with a sample data set.Â
You can conduct a small study just to check your algorithm’s performance with a given data set and plot a graph.Â
Model Skill (Algorithm’s performance) = X-axis
The resulting graph will provide a slope that you can extrapolate to find out how much data the model actually needs.Â
The graph is often called the learning curve, and it’s an educated way to guess the size.Â
Refer Previously Published Studies
Data Science is a hot field right now and there are studies being conducted round the clock. People who learn deep learning also create studies. In fact, studies can be performed in our Data Science course at Imarticus.Â
You can utilize these studies as reference points for guessing a good data set size. If a study falls in the domain of your problem, it’ll be much easier to make a good guess. Just make sure that you factor in the complexity and type of your problem wisely.Â
Getting the data size right is more of an empirical skill. Before you get there, you can use these tactics to start making some of your initial Deep Neural Networks. As your Deep learning training progresses, you’ll get a good hold of the underlying problems related to the size of the data set.Â