The importance of data in machine learning models: Is there a magic number that determines the amount of training data required

Home > All  > The importance of data in machine learning models: Is there a magic number that determines the amount of training data required

This is perhaps one of the most difficult questions to answer when the development of a machine learning project is underway: how much data or examples are needed to build a training dataset able to achieve a good performance model?

Let’s use deep learning –one of the main approaches that has experienced a major boom in recent years– as a particular case study to analyze this issue, and particularly, the image classification models.

There are two main reasons that explain the resurgence of this area in recent years. On the one hand, the increase in data availability largely due to the emergence of the Internet, which has facilitated the collection and distribution of large data sets, to the interaction of people with new digital devices (laptops, mobile devices). On the other hand, the evolution of the available computational resources, which allow for the execution of increasingly complex models and the development of new techniques to train deep networks [2, 1].

We talk about a resurgence in recent years because there are commercial applications that have been using deep learning since the 90’s, when it was considered more of an art than a technology, and could only be applied by an expert with a specific set of skills to obtain a good performance from the algorithm. However, while this specific knowledge is still required today to apply these types of algorithms, the number of required skills is reduced as the amount of training data increases [2].

Figure 1 (taken from [3]) shows an example of how the performance of different types of algorithms evolves as the amount of training data increases.
Usually, in older or more traditional learning algorithms (such as linear regressions or logistic regressions) performance becomes stagnant or limited. This means that the learning curve flattens out, and the algorithm stops improving even with more data input [3].
On the other hand, if a small neural network (with a few hidden layers and units) is trained on the same supervised learning task, it is likely to achieve a small improvement, which can be even more considerable if a bigger (deeper) neural network is trained, increasing the complexity of the model [3].

There are some general approximations or rules that attempt to define objective values, such as in [2], where the authors mention that in 2016, it was established as a general rule that a supervised deep learning algorithm generally achieves acceptable performance using around 5000 examples per category.

However, the reality is that it is very difficult or virtually impossible to determine in advance and with complete certainty the ideal size of a data set.
Some of the factors that influence the amount of data required to train a model and achieve good performance are:

● Complexity of the learning task. For example, when it comes to image classification, how different from each other are the classes or categories on which a model is to be trained, and under which context the captures are taken (amount of noise in the image).
● What data augmentation variants can be used on the data.
● If pre-trained models exist and the ability to use them (transfer learning). This means to use part of the weights of the trained models on a similar task.
● The type of input data, its dimension or size.
● Application of preprocessing tasks on the data (e.g., dimensionality reduction)
● The complexity of the used model, determined by its architecture.
● The quality of the data, which may affect performance if data present excessive noise or don’t include the necessary information to predict the desired outcome. It is important to replicate the training and implementation environments, meaning that the training data are similar in context to those that will be used as input when the model is deployed.

Overfitting is an indicator that can let us know if further data collection is required. However, and more so if increasing the size of the data is complex due to the nature of the problem, it is possible to use some strategies beforehand to improve the generalization of the model, such as the use of pre-trained models (transfer learning), reducing the complexity of the model, or incorporating regularization strategies.

In conclusion, while it is true that perhaps one of the safest ways to improve the performance of an algorithm is by training a large model (a deep network) on a large amount of data, it is important to analyze each case promptly, possess a good initial size of training data (having certain general rules as a starting point), of quality and consistent with the objective task, knowing full well that the optimal number for the expected performance will ultimately be influenced by several factors.

[1] Francois, Chollet. Deep learning with Python» (2017).
[2] Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. Deep learning. MIT press, 2016.
[3] Ng, A. Machine learning yearning: Technical strategy for ai engineers in the era of deep learning. (2019).


  • Machine Learning Training in Hyderabad

    enero 31, 2020at6:04 am

    Very informative post with everything being explained in a precisely clear manner. Was looking for this info from a while

  • Machine learning Training in Hyderabad

    febrero 10, 2020at11:11 am

    Nice Information Thank you for sharing the post about machine learning.keep posting

  • Machine Learning Training in Hyderabad

    marzo 19, 2020at6:06 am

    Informative post. Concept has been explained very well.Looking forward for such informative posts


    marzo 19, 2020at11:09 pm

    Simply wish to say your article is as surprising.

    The clarity in your post is just great and i could assume you’re an expert on this subject.
    Fine with your permission allow me to grab your RSS
    feed to keep up to date with forthcoming post.
    Thanks a million and please carry on the enjoyable work.


    marzo 19, 2020at11:58 pm

    Hi there! I just wanted to ask if you ever have any trouble with hackers?
    My last blog (wordpress) was hacked and I ended up losing several
    weeks of hard work due to no back up. Do you have any solutions to stop hackers?


    marzo 20, 2020at8:04 pm

    Hi Dear, are you genuinely visiting this web page on a regular basis, if so then you
    will without doubt take pleasant know-how.


    marzo 21, 2020at5:12 pm

    Its such as you learn my thoughts! You seem to know a lot about this, such as
    you wrote the book in it or something. I think that you could do
    with some percent to force the message house a bit, however instead
    of that, that is great blog. A fantastic read. I’ll definitely be back.


    marzo 22, 2020at4:38 pm

    If you want to increase your experience only keep visiting this web page and be updated with the most
    up-to-date news update posted here.

  • cialis price

    mayo 19, 2020at7:23 am

    You really make it appear so easy along with your presentation but I in finding this topic to be actually something which I feel I might never understand.
    It kind of feels too complicated and extremely wide for me.
    I’m having a look ahead for your next put up, I will attempt to get the
    hang of it!


    junio 4, 2020at11:17 pm
    Hello my friend! I want to say that this post is amazing, nice written and come with
    approximately all important infos. I’d like to see extra posts like this .


    junio 5, 2020at3:08 pm
    Every weekend i used to pay a visit this site, because i want enjoyment, as this
    this web page conations actually pleasant funny stuff too.

  • ปั้มไลค์

    julio 11, 2020at7:17 am

    Like!! I blog frequently and I really thank you for your content. The article has truly peaked my interest.

  • power apps

    septiembre 24, 2020at1:41 am

    Fantastic goods from you, man. I have understand your stuff previous to
    and you are just extremely wonderful. I actually like what you have acquired here, really like what you are stating and the way in which you say it.
    You make it entertaining and you still care for to keep it sensible.
    I cant wait to read much more from you. This is actually a terrific website.

  • Michalowski

    noviembre 2, 2020at5:51 am

    Hello 🙂 Your method of describing the whole thing in this paragraph is genuinely fastidious, every one be able to simply be aware of it, Thanks a lot.


    noviembre 4, 2020at8:15 pm
    What’s up, this weekend is nice for me, for the reason that this moment i am reading this great educational
    piece of writing here at my home.


    noviembre 24, 2020at10:30 am

    Interesting article. Were did you got all the information from

  • Domaining

    diciembre 31, 2020at9:43 pm

    A individual trainer is a personally who helps people exercise. The reach of vocation with a view a special trainer is to increase the components of healthiness for the general, strong population. The five definitive components of seemliness are athletic tenacity, robust patience, assembly alloy, cardiovascular tenacity, and pliability, although there are other subsets like power, glance at, and speed. The non-specific citizenry is defined as an length of existence range of 18 to about 50 (45 and younger for males, 55 and younger because of females). The statement of meaning of healthy in this background means an absence of a infection that would adopt undivideds facility to exercise. Anyone outside that extent of in real life inexperienced should be placed in a trainers sphere after a afflict to the doctor to catch a glimpse of what kind, if any, exercise they are capable of.

  • Webmaster m106

    enero 2, 2021at2:59 am

    Men are like bagpipes no sound comes from them until theyre full.

  • Webmaster XMC

    enero 4, 2021at12:48 am

    Good thorough ideas here.Id like to suggest taking a look at such as something like cheeseburger. What do you think?

  • Vijay Sharma

    enero 5, 2021at10:21 am

    Fantastic blog extremely good well enjoyed with the incredible informative content which surely activates the learners to gain enough knowledge. This, in turn, makes the readers explore themselves and involve deeply in the subject. Wish you to dispatch similar content successively in the future as well.

    Machine Learning Courses

  • WEb Development

    enero 6, 2021at2:55 am

    The wise person has long ears and a short tongue.

  • Machine Learning Training in Hyderabad

    enero 13, 2021at5:51 am

    Excellent read, Positive site, where did u come up with the information on this posting? I have read a few of the articles on your website now, and I really like your style. Thanks a million and please keep up the effective work

  • machine learning course malaysia

    enero 15, 2021at4:48 am

    Even when the going gets tough, you continue to have the best attitude!

  • Pianino XMC

    febrero 5, 2021at10:56 pm

    Beautiful and even precise in fact. I for instance the content understanding in home-page. Also the several options designed for blog and even full-width sites. Looks like this is usually a great area for young adults. Japan for Free

  • Pianino System

    febrero 7, 2021at9:17 am

    Quite a few of the ideas associated with this blog post are excellent yet had myself wanting to know, did they truly mean that? One point I have to say is definitely your authoring abilities are very great and I will probably be returning back for any new post you come up with, you may well have a completely new fan. I saved your main website for reference.

  • Metale Wlasciwosci

    febrero 9, 2021at3:12 am

    To jest prawdziwa przyjazń – oslaniac innych nawet kosztem siebie. Prymas Stefan Wyszyński…

  • Portal

    febrero 9, 2021at5:03 pm

    I’d must test with you here. Which isn’t something I often do! I get pleasure from studying a put up that will make people think. Additionally, thanks for permitting me to comment!

  • Socjologia

    febrero 10, 2021at9:39 pm

    Cool, there is really some excellent details on here, many of my followers will possibly find this related, will send a backlink, cheers.

  • Free Japan

    febrero 12, 2021at9:17 pm

    Hi everyone I have been glued to my seat definitely . It is nice to know that all the information I need is right in front of me. enjoyed it

  • Skandynawia

    febrero 16, 2021at12:33 am

    Bardzo ciekawy blog, rzeczowy i wywazony. Od dzisiaj zagladam regularnie. Pozdrowienia 🙂

  • Filozofia Logika

    febrero 19, 2021at12:43 pm

    Too bad the intelligence quotient degree and rank are nt mutually exclusive .

  • System Akademicki USA

    febrero 22, 2021at10:10 pm

    Hi I just dropped by and wanted to say you to have a Merry Christmas. Let all your wishes make come true for you and your family and lets hope the next year be prosperous for all us.Merry Christmas

  • XMC Polska

    febrero 24, 2021at12:58 am

    I love this site and its writers, its a joy to read

  • Japonia Kobieta

    febrero 26, 2021at11:35 pm

    Thank you so much, wonderful job! This was the thing I needed to get.

  • Diabetycy

    marzo 1, 2021at4:53 am

    Thanks pertaining to spreading the following great content material on your site. I discovered it on google. I may check back again when you post additional aricles.

  • machine learning training

    marzo 8, 2021at6:43 am

    You have an extremely knowledgeable perspective. It’s incredible how thorough your work is.
    machine learning training

  • Szkło zastosowanie

    marzo 14, 2021at6:16 pm

    I love your website.. very nice colors & theme. Did you create this site yourself or did you hire somebody to do it for you? Plz reply back as I’m looking to create my own website and would like to know where u got this from. thank you

  • Leasing

    marzo 29, 2021at2:05 pm

    Dude, please tell me that youre going to publish more. I notice you havent written an additional blog for a while (Im just catching up myself). Your weblog is just too important to become missed. Youve acquired so substantially to say, these knowledge about this topic it would be a shame to see this blog disappear. The internet needs you, man!

  • Pozaeuropejskie Raje Podatkowe

    abril 11, 2021at12:09 am

    I’ll be back as soon as once more within the long run to examine out your blogposts down the road. Thanks!

  • SEO Links

    abril 15, 2021at4:44 pm

    Nine times out of Ten I will guess this site is powered by Blogengine. Mostly because there are a lot of not really related comments people posted. You do run a wonderful website, but I strongly recommend to call the cleaner here because there is a lot of sp** posts here Well, till you get this done bye =)

  • Ekonomia

    abril 17, 2021at6:55 am

    Hey. I just stumbled into your page while browsing Google . Ive saved it. Ill definitely be back. I was wondering, have you watched the new Shrek movie yet? I know this is from left field. I need to stop by the video store and rent it tonight. Its great. Bye.

  • Ekonomia Gospodarka

    abril 17, 2021at3:32 pm

    I am glad to be one of the visitors on this great website (:, thankyou for posting .


    mayo 11, 2021at10:33 pm

    hi!,I love your writing so a lot! percentage we keep up a
    correspondence extra approximately your post on AOL? I need an expert on this area to
    resolve my problem. Maybe that is you! Taking a look ahead to peer you.

    my site سعید محمد (

  • Reed

    mayo 14, 2021at7:03 am

    Wow, wonderful blog structure! How lengthy
    have you been running a blog for? you make running a blog glance easy.

    The whole look of your website is great, let alone the content material!

    Visit my web blog فرش ساوین (Reed)

  • Cliff

    mayo 15, 2021at12:35 am

    Attractive section of content. I just stumbled upon your website and in accession capital to say that I acquire
    actually loved account your weblog posts. Anyway I will be subscribing for your augment or even I
    fulfillment you access consistently fast.

    Stop by my site :: تور کیش از اصفهان – Cliff

  • takipçi satın al

    mayo 15, 2021at1:09 pm

    Spot on with this write-up, I honestly feel this amazing
    site needs a lot more attention. I’ll probably be returning to see more, thanks for the information!

  • Tangela

    mayo 15, 2021at2:35 pm

    Hurrah, that’s what I was exploring for, what a stuff!
    existing here at this web site, thanks admin of this web page.

    Here is my web blog – آموزش Access (Tangela)

  • mariahuana

    mayo 16, 2021at3:39 am

    It’ll delve into not simply constructive
    details about solar power, however adverse ones, as effectively.
    You should utilize a different tank, just make sure
    that the tank is clear and freed from any soap or chemicals because
    these will kill them very easily. Make your very own sea kitten, and share with pals!

    Varner Harbor inside the SRA provides easy accessibility to the
    sea for boating and water skiing. Right now, you may access the Sea by wheeling
    or carrying your non-motorized vessel (or motorized, relying on how heavy it’s) across
    the beach directly to the water. Varner Harbor is
    closed to vessel entry till additional notice. At the moment we know of no different motorized entry to the water on the
    Salton Sea. The Division of Parks and Recreation is meeting to find out
    the best course of action with regard to the lowering water degree of the Salton Sea and recreational boating entry.
    Kayakers, campers, birdwatchers, photographers and hikers can enjoy the location’s many recreation opportunities.
    Salton Sea State Recreation Area covers 14 miles of the northeastern shore and has long been a
    preferred site for campers, boaters and anglers.
    Turn right at Avenue sixty six after which proper onto Highway 111.
    Go south about 12 miles to the Salton Sea SRA Headquarters entrance.

  • Coral

    mayo 16, 2021at10:46 am

    Hello, after reading this awesome piece of writing
    i am also cheerful to share my experience here with friends.

    Stop by my blog; تور مشهد از اصفهان (Coral)

  • mariahuana

    mayo 16, 2021at11:10 am

    Watching reside streaming sky sports from UK, France, Germany, Spain, Italy, Canada, USA and
    other locations is feasible once you have the software.

    This is a new and very talked-about software that provides you access to watch free streaming SKY, FOX, CBS, ESPN, HBO, NFL football
    and SKY sports activities on-line. You’ll be able to then carry your local sports channels like
    SKY, FOX, ABC, CBS, ESPN, HBO, NBC, NFL Football action and Free SKY Sports activities
    to any place that you simply journey to and nonetheless watch
    American football events streaming online. Any of the current home windows operating platforms can be fine to watch the streaming SKY, CBS,
    ESPN, FOX, NFL Sport and SKY Sports activities Streaming on-line.
    That is the very best pc tv software for streaming free reside SKY, FOX,
    CBS, NBC, ESPN, NFL and SKY television exhibits on-line on a pc since it has higher
    image and sound, is straightforward to use, and costs
    much lower than cable. To give you an idea of how a lot alkaline food we should
    always have every meal is around 80% alkaline based.

  • Michale

    mayo 18, 2021at8:27 am

    I feel this is among the so much vital info for
    me. And i am glad studying your article.
    However wanna commentary on few basic things, The website style is perfect, the articles is
    truly great : D. Good process, cheers

    my web site :: فرش ماشینی ارزان (Michale)

  • mariahuana

    mayo 18, 2021at10:47 am

    Watch streaming free SKY, ESPN, FOX, CBS, NFL 2011, and
    SKY tv shows online utilizing a new tv for computer software.
    You won’t see many people utilizing them in clay, sand,
    or mud, where there are fewer objects for the anchor to attach.
    2) Wage data will not be out there for the business group specified.
    BLS doesn’t observe wage information particular to the solar power business.
    Building a solar power plant is advanced and site choice requires years of analysis and planning.
    Prior to beginning construction on a new solar plant,
    actual estate brokers and scientists should
    guarantee the positioning is appropriate and that the correct
    federal, state, and local permits are obtained for development of a
    energy plant. Whereas these excessive-degree information make business solar a pretty investment,
    there are necessary dangers that must be mitigated so as to make sure that commercial
    solar systems are performing optimally. Many of those plants use curved mirrors,
    that are challenging to produce. Currently, many giant solar plants in the United States have
    been constructed on-or are proposed to be built on-federal
    lands, so brokers must work with the Bureau of Land
    Management to acquire leases for these properties.

    Real estate brokers should work with local, state, and federal authorities companies, community members and organizations, utility corporations, and others that have a stake within the
    proposed power plant.

  • takipçi satın al

    mayo 18, 2021at12:25 pm

    Excellent blog here! Also your web site loads up fast! What host are you using?

    Can I get your affiliate link to your host? I wish my website loaded up as quickly as
    yours lol


    mayo 20, 2021at4:14 am

    Terrific work! That is the type of information that
    should be shared across the internet. Disgrace on the search
    engines for not positioning this post upper!

    Come on over and consult with my website . Thanks =)

    Feel free to surf to my web page: سعید محمد –,

  • Aline

    mayo 20, 2021at9:28 am

    Hi there, I enjoy reading all of your article.
    I wanted to write a little comment to support you.

    Look into my web site … دانلود آهنگ داوود یونسی (Aline)


    mayo 20, 2021at4:43 pm

    Hey! Someone in my Facebook group shared this site with us so I came to check it out.
    I’m definitely loving the information. I’m bookmarking and will be tweeting this to my
    followers! Excellent blog and superb design and style.

Post a Comment