[ { "start": 1.17, "text": "[Music]" }, { "start": 10.28, "text": "good afternoon everyone and welcome to" }, { "start": 12.88, "text": "MIT sus1 191 my name is Alexander amini" }, { "start": 16.84, "text": "and I'll be one of your instructors for" }, { "start": 18.32, "text": "the course this year along with Ava and" }, { "start": 21.56, "text": "together we're really excited to welcome" }, { "start": 23.359, "text": "you to this really incredible course" }, { "start": 25.16, "text": "this is a very fast-paced and very uh" }, { "start": 29.24, "text": "intense one week that we're about to go" }, { "start": 32.079, "text": "through together right so we're going to" }, { "start": 33.559, "text": "cover the foundations of a also very" }, { "start": 36.52, "text": "fast-paced moving field and a field that" }, { "start": 39.239, "text": "has been rapidly changing over the past" }, { "start": 41.96, "text": "eight years that we have taught this" }, { "start": 43.719, "text": "course at MIT now over the past decade" }, { "start": 48.36, "text": "in fact even before we started teaching" }, { "start": 50.48, "text": "this course Ai and deep learning has" }, { "start": 52.8, "text": "really been revolutionizing so many" }, { "start": 55.6, "text": "different advances and so many different" }, { "start": 58.359, "text": "areas of science meth mathematics" }, { "start": 60.519, "text": "physics and and so on and not that long" }, { "start": 63.879, "text": "ago we were having new types of we were" }, { "start": 67.159, "text": "having challenges and problems that we" }, { "start": 70.36, "text": "did not think were necessarily solvable" }, { "start": 72.92, "text": "in our lifetimes that AI is now actually" }, { "start": 75.799, "text": "solving uh Beyond human performance" }, { "start": 79.6, "text": "today and each year that we teach this" }, { "start": 82.52, "text": "course uh this lecture in particular is" }, { "start": 85.72, "text": "getting harder and harder to teach" }, { "start": 87.72, "text": "because for an introductory level course" }, { "start": 90.92, "text": "this lecture lecture number one is the" }, { "start": 93.28, "text": "lecture that's supposed to cover the" }, { "start": 94.36, "text": "foundations and if you think to any" }, { "start": 96.36, "text": "other introductory course like a" }, { "start": 98.64, "text": "introductory course 101 on mathematics" }, { "start": 101.36, "text": "or biology those lecture ones don't" }, { "start": 103.84, "text": "really change that much over time but" }, { "start": 106.24, "text": "we're in a rapidly changing field of AI" }, { "start": 108.799, "text": "and deep learning where even these types" }, { "start": 112.0, "text": "of lectures are rapidly changing so let" }, { "start": 115.6, "text": "me give you an example of how we" }, { "start": 117.24, "text": "introduced this course only a few years" }, { "start": 119.56, "text": "ago" }, { "start": 121.68, "text": "hi everybody and welcome to MIT 6s" }, { "start": 126.32, "text": "one91 the official introductory course" }, { "start": 129.72, "text": "on deep learning taught here at" }, { "start": 133.44, "text": "MIT deep learning is revolutionizing so" }, { "start": 137.44, "text": "many fields from robotics to medicine" }, { "start": 141.28, "text": "and everything in" }, { "start": 143.2, "text": "between you'll learn the fundamentals of" }, { "start": 146.599, "text": "this field and how you can build so" }, { "start": 150.12, "text": "these incredible" }, { "start": 152.44, "text": "algorithms in fact this entire speech" }, { "start": 156.319, "text": "and in video are not real and were" }, { "start": 159.84, "text": "created using deep learning and" }, { "start": 162.72, "text": "artificial" }, { "start": 164.8, "text": "intelligence and in this class you'll" }, { "start": 167.4, "text": "learn how it has been an honor to speak" }, { "start": 170.92, "text": "with you today and I hope you enjoy the" }, { "start": 176.92, "text": "course the really surprising thing about" }, { "start": 180.64, "text": "that video to me when we first did it" }, { "start": 183.68, "text": "was how viral it went a few years ago so" }, { "start": 187.04, "text": "just in a couple months of us teaching" }, { "start": 189.04, "text": "this course a few years ago that video" }, { "start": 191.08, "text": "went very viral right it got over a" }, { "start": 193.4, "text": "million views within only a few months" }, { "start": 196.2, "text": "uh people were shocked with a few things" }, { "start": 198.599, "text": "but the main one was the realism of AI" }, { "start": 202.36, "text": "to be able to generate content that" }, { "start": 205.64, "text": "looks and sounds extremely" }, { "start": 208.36, "text": "hyperrealistic" }, { "start": 209.959, "text": "right and when we did this video when we" }, { "start": 212.239, "text": "created this for the class only a few" }, { "start": 214.48, "text": "years ago this video took us about" }, { "start": 217.159, "text": "$10,000 and compute to generate just" }, { "start": 219.72, "text": "about a minute long video extremely I" }, { "start": 222.2, "text": "mean if you think about it I would say" }, { "start": 223.64, "text": "it's extremely expensive to compute" }, { "start": 225.76, "text": "something what we look at like that and" }, { "start": 227.84, "text": "maybe a lot of you are not really even" }, { "start": 229.239, "text": "impressed by the technology today" }, { "start": 231.159, "text": "because you see all of the amazing" }, { "start": 232.599, "text": "things that Ai and deep learning are" }, { "start": 235.439, "text": "producing now fast forward today the" }, { "start": 238.4, "text": "progress in deep learning yeah and" }, { "start": 240.2, "text": "people were making all kinds of you know" }, { "start": 242.72, "text": "exciting remarks about it when it came" }, { "start": 244.48, "text": "out a few years ago now this is common" }, { "start": 246.12, "text": "stuff because AI is really uh doing much" }, { "start": 249.319, "text": "more powerful things than this fun" }, { "start": 251.76, "text": "little introductory video so today fast" }, { "start": 255.92, "text": "forward four years about yeah four years" }, { "start": 259.0, "text": "to today right now where are we AI is" }, { "start": 261.799, "text": "now generating content with deep" }, { "start": 264.84, "text": "learning being so commoditized right" }, { "start": 267.56, "text": "deep learning is in all of our" }, { "start": 269.039, "text": "fingertips now online in our smartphones" }, { "start": 272.52, "text": "and so on in fact we can use deep" }, { "start": 275.6, "text": "learning to generate these types of" }, { "start": 279.24, "text": "hyperrealistic pieces of media and" }, { "start": 281.72, "text": "content entirely from English language" }, { "start": 284.56, "text": "without even coding anymore right so" }, { "start": 286.8, "text": "before we had to actually go in train" }, { "start": 288.44, "text": "these models and and really code them to" }, { "start": 291.24, "text": "be able to create that one minute long" }, { "start": 293.32, "text": "video today we have models that will do" }, { "start": 295.88, "text": "that for us end to end directly from" }, { "start": 298.44, "text": "English language so we can these models" }, { "start": 300.68, "text": "to create something that the world has" }, { "start": 302.28, "text": "never seen before a photo of an" }, { "start": 304.24, "text": "astronaut riding a horse and these" }, { "start": 306.16, "text": "models can imagine those pieces of" }, { "start": 308.72, "text": "content entirely from scratch my" }, { "start": 311.72, "text": "personal favorite is actually how we can" }, { "start": 313.24, "text": "now ask these deep learning models to uh" }, { "start": 317.12, "text": "create new types of software even" }, { "start": 319.36, "text": "themselves being software to ask them to" }, { "start": 321.72, "text": "create for example to write this piece" }, { "start": 324.12, "text": "of tensorflow code to train a neural" }, { "start": 327.199, "text": "network right we're asking a neural" }, { "start": 328.6, "text": "network to write t flow code to train" }, { "start": 331.8, "text": "another neural network and our model can" }, { "start": 333.8, "text": "produce examples of functional and" }, { "start": 336.68, "text": "usable pieces of code that satisfy this" }, { "start": 340.44, "text": "English prompt while walking through" }, { "start": 342.919, "text": "each part of the code independently so" }, { "start": 344.96, "text": "not even just producing it but actually" }, { "start": 346.8, "text": "educating and teaching the user on what" }, { "start": 349.28, "text": "each part of these uh code blocks are" }, { "start": 351.72, "text": "actually doing you can see example here" }, { "start": 355.16, "text": "and really what I'm trying to show you" }, { "start": 356.72, "text": "with all of this is that this is just" }, { "start": 359.639, "text": "highlighting how far deep learning has" }, { "start": 362.16, "text": "gone even in a couple years since we've" }, { "start": 364.84, "text": "started teaching this course I mean" }, { "start": 367.4, "text": "going back even from before that to" }, { "start": 369.12, "text": "eight years ago and the most amazing" }, { "start": 371.68, "text": "thing that you'll see in this course in" }, { "start": 374.599, "text": "my opinion is that what we try to do" }, { "start": 377.479, "text": "here is to teach you the foundations of" }, { "start": 379.44, "text": "all of this how all of these different" }, { "start": 381.599, "text": "types of models are created from the" }, { "start": 383.72, "text": "ground up and how we can make all of" }, { "start": 386.599, "text": "these amazing advances possible so that" }, { "start": 388.759, "text": "you can also do it on your own as well" }, { "start": 391.44, "text": "and like I mentioned in the beginning" }, { "start": 392.72, "text": "this introduction course is getting" }, { "start": 394.68, "text": "harder and harder to do uh and to make" }, { "start": 397.84, "text": "every year I don't know where the field" }, { "start": 399.56, "text": "is going to be next year and I mean" }, { "start": 402.36, "text": "that's my my honest truth or even" }, { "start": 405.039, "text": "honestly in even one or two months time" }, { "start": 407.28, "text": "from now uh just because it's moving so" }, { "start": 410.28, "text": "incredibly fast but what I do know is" }, { "start": 412.8, "text": "that uh what we will share with you in" }, { "start": 414.84, "text": "the course as part of this one week is" }, { "start": 417.56, "text": "going to be the foundations of all of" }, { "start": 419.12, "text": "the tech technologies that we have seen" }, { "start": 421.039, "text": "up until this point that will allow you" }, { "start": 422.84, "text": "to create that future for yourselves and" }, { "start": 425.0, "text": "to design brand new types of deep" }, { "start": 427.039, "text": "learning models uh using those" }, { "start": 429.599, "text": "fundamentals and those" }, { "start": 432.44, "text": "foundations so let's get started with" }, { "start": 435.479, "text": "with all of that and start to figure out" }, { "start": 437.199, "text": "how we can actually achieve all of these" }, { "start": 439.52, "text": "different pieces and learn all of these" }, { "start": 442.319, "text": "different components and we should start" }, { "start": 444.52, "text": "this by really tackling the foundations" }, { "start": 447.56, "text": "from the very beginning and asking" }, { "start": 449.08, "text": "ourselves" }, { "start": 450.16, "text": "you know we've heard this term I think" }, { "start": 451.68, "text": "all of you obviously before you've come" }, { "start": 453.56, "text": "to this class today you've heard the" }, { "start": 455.0, "text": "term deep learning but it's important" }, { "start": 456.919, "text": "for you to really understand how this" }, { "start": 459.12, "text": "concept of deep learning relates to all" }, { "start": 461.919, "text": "of the other pieces of science that" }, { "start": 463.879, "text": "you've learned about so far so to do" }, { "start": 466.52, "text": "that we have to start from the very" }, { "start": 467.919, "text": "beginning and start by thinking about" }, { "start": 469.68, "text": "what is intelligence at its core not" }, { "start": 472.08, "text": "even artificial intelligence but just" }, { "start": 474.0, "text": "intelligence right so the way I like to" }, { "start": 476.039, "text": "think about this is that I like to think" }, { "start": 478.68, "text": "that in elligence is the ability to" }, { "start": 482.759, "text": "process" }, { "start": 483.759, "text": "information which will inform your" }, { "start": 486.08, "text": "future decision-mak" }, { "start": 487.72, "text": "abilities now that's something that we" }, { "start": 489.759, "text": "as humans do every single day now" }, { "start": 492.479, "text": "artificial intelligence is simply the" }, { "start": 495.08, "text": "ability for us to give computers that" }, { "start": 497.479, "text": "same ability to process information and" }, { "start": 500.68, "text": "inform future" }, { "start": 502.479, "text": "decisions now machine learning is simply" }, { "start": 505.639, "text": "a subset of artificial intelligence the" }, { "start": 508.599, "text": "way you should think of machine learning" }, { "start": 510.72, "text": "is just as the programming ability or" }, { "start": 513.599, "text": "let's say even simpler than that machine" }, { "start": 515.479, "text": "learning is the science" }, { "start": 518.64, "text": "of of trying to teach computers how to" }, { "start": 522.24, "text": "do that processing of information and" }, { "start": 524.76, "text": "decision making from data so instead of" }, { "start": 527.92, "text": "hardcoding some of these rules into" }, { "start": 529.88, "text": "machines and programming them like we" }, { "start": 532.16, "text": "used to do in in software engineering" }, { "start": 534.0, "text": "classes now we're going to try and do" }, { "start": 536.04, "text": "that processing of information and" }, { "start": 538.36, "text": "informing a future decision decision" }, { "start": 539.64, "text": "making abilities directly from data and" }, { "start": 542.6, "text": "then going one step deeper deep learning" }, { "start": 544.959, "text": "is simply the subset of machine learning" }, { "start": 547.24, "text": "which uses neural networks to do that it" }, { "start": 549.92, "text": "uses neural networks to process raw" }, { "start": 552.56, "text": "pieces of data now unprocessed data and" }, { "start": 555.72, "text": "allows them to ingest all of those very" }, { "start": 558.16, "text": "large data sets and inform future" }, { "start": 560.56, "text": "decisions now that's exactly what this" }, { "start": 563.24, "text": "class is is really all about if you" }, { "start": 565.6, "text": "think of if I had to summarize this" }, { "start": 567.44, "text": "class in just one line it's about" }, { "start": 569.76, "text": "teaching machines how to process data" }, { "start": 572.519, "text": "process information and inform" }, { "start": 574.959, "text": "decision-mak abilities from that data" }, { "start": 577.44, "text": "and learn it from that" }, { "start": 579.64, "text": "data now this program is split between" }, { "start": 584.079, "text": "really two different parts so you should" }, { "start": 586.0, "text": "think of this class as being captured" }, { "start": 588.04, "text": "with both technical lectures which for" }, { "start": 590.92, "text": "example this is one part of as well as" }, { "start": 593.56, "text": "software Labs we'll have several new" }, { "start": 596.04, "text": "updates this year as I mentioned earlier" }, { "start": 598.12, "text": "just covering the rap changing of" }, { "start": 600.0, "text": "advances in Ai and especially in some of" }, { "start": 602.76, "text": "the later lectures you're going to see" }, { "start": 604.44, "text": "those the first lecture today is going" }, { "start": 606.839, "text": "to cover the foundations of neural" }, { "start": 608.88, "text": "networks themselves uh starting with" }, { "start": 611.64, "text": "really the building blocks of every" }, { "start": 613.32, "text": "single neural network which is called" }, { "start": 614.76, "text": "the perceptron and finally we'll go" }, { "start": 617.399, "text": "through the week and we'll conclude with" }, { "start": 619.88, "text": "a series of exciting guest lectures from" }, { "start": 622.72, "text": "industry leading sponsors of the course" }, { "start": 625.68, "text": "and finally on the software side after" }, { "start": 629.64, "text": "every lecture you'll also get software" }, { "start": 632.079, "text": "experience and project building" }, { "start": 633.839, "text": "experience to be able to take what we" }, { "start": 635.72, "text": "teach in lectures and actually deploy" }, { "start": 637.88, "text": "them in real code and and actually" }, { "start": 640.839, "text": "produce based on the learnings that you" }, { "start": 643.24, "text": "find in this lecture and at the very end" }, { "start": 644.959, "text": "of the class from the software side" }, { "start": 646.92, "text": "you'll have the ability to participate" }, { "start": 648.839, "text": "in a really fun day at the very end" }, { "start": 651.32, "text": "which is the project pitch competition" }, { "start": 653.519, "text": "it's kind of like a shark tank style" }, { "start": 655.36, "text": "competition of all of the different uh" }, { "start": 657.639, "text": "projects from all of you and win some" }, { "start": 659.8, "text": "really awesome prizes so let's step" }, { "start": 662.24, "text": "through that a little bit briefly this" }, { "start": 663.6, "text": "is the the syllabus part of the lecture" }, { "start": 666.72, "text": "so each day we'll have dedicated" }, { "start": 668.399, "text": "software Labs that will basically mirror" }, { "start": 671.16, "text": "all of the technical lectures that we go" }, { "start": 672.92, "text": "through just helping you reinforce your" }, { "start": 674.48, "text": "learnings and these are coupled with" }, { "start": 676.8, "text": "each day again coupled with prizes for" }, { "start": 679.639, "text": "the top performing software solutions" }, { "start": 681.76, "text": "that are coming up in the class this is" }, { "start": 683.519, "text": "going to start with today with lab one" }, { "start": 686.12, "text": "and it's going to be on music generation" }, { "start": 688.32, "text": "so you're going to learn how to build a" }, { "start": 689.8, "text": "neural network that can learn from a" }, { "start": 692.44, "text": "bunch of musical songs listen to them" }, { "start": 695.76, "text": "and then learn to compose brand new" }, { "start": 697.76, "text": "songs in that same" }, { "start": 700.44, "text": "genre tomorrow lab two on computer" }, { "start": 703.32, "text": "vision you're going to learn about" }, { "start": 705.639, "text": "facial detection systems you'll build a" }, { "start": 707.92, "text": "facial detection system from scratch" }, { "start": 710.279, "text": "using uh convolutional neural networks" }, { "start": 712.6, "text": "you'll learn what that means tomorrow" }, { "start": 714.72, "text": "and you'll also learn how to actually" }, { "start": 716.92, "text": "debias remove the biases that exist in" }, { "start": 719.76, "text": "some of these facial detection systems" }, { "start": 721.959, "text": "which is a huge problem for uh the" }, { "start": 724.079, "text": "state-of-the-art solutions that exist" }, { "start": 725.839, "text": "today and finally a brand new Lab at the" }, { "start": 729.2, "text": "end of the course will focus on large" }, { "start": 731.36, "text": "language models well where you're" }, { "start": 733.36, "text": "actually going to take a billion" }, { "start": 735.32, "text": "multi-billion parameter large language" }, { "start": 737.24, "text": "model and fine-tune it to build an" }, { "start": 740.279, "text": "assistive chatbot and evaluate a set of" }, { "start": 743.56, "text": "cognitive abilities ranging from" }, { "start": 745.079, "text": "mathematics abilities to Scientific" }, { "start": 746.839, "text": "reasoning to logical abilities and so so" }, { "start": 750.199, "text": "on and finally at the very very end" }, { "start": 753.16, "text": "there will be a final project pitch" }, { "start": 755.24, "text": "competition for up to 5 minutes per team" }, { "start": 758.92, "text": "and all of these are accompanied with" }, { "start": 760.92, "text": "great prices so definitely there will be" }, { "start": 762.959, "text": "a lot of fun to be had throughout the" }, { "start": 764.32, "text": "week there are many resources to help" }, { "start": 767.12, "text": "with this class you'll see them posted" }, { "start": 769.079, "text": "here you don't need to write them down" }, { "start": 770.32, "text": "because all of the slides are already" }, { "start": 771.8, "text": "posted online please post to Piaza if" }, { "start": 774.279, "text": "you have any questions and of course we" }, { "start": 777.16, "text": "have an amazing team uh that is helping" }, { "start": 779.959, "text": "teach this course this year and you can" }, { "start": 782.079, "text": "reach out to any of us if you have any" }, { "start": 783.88, "text": "questions the Piaza is a great place to" }, { "start": 785.76, "text": "start myself and AA will be the two main" }, { "start": 788.8, "text": "lectures for this course uh Monday" }, { "start": 791.32, "text": "through Wednesday especially and we'll" }, { "start": 793.079, "text": "also be hearing some amazing guest" }, { "start": 794.76, "text": "lectures on the second half of the" }, { "start": 796.88, "text": "course which definitely you would want" }, { "start": 798.639, "text": "to attend because they they really cover" }, { "start": 800.88, "text": "the really state-of-the-art sides of" }, { "start": 803.16, "text": "deep learning uh that's going on in" }, { "start": 805.24, "text": "Industry outside of" }, { "start": 807.68, "text": "Academia and very briefly just want to" }, { "start": 809.959, "text": "give a huge thanks to all of our" }, { "start": 811.76, "text": "sponsors who without their support this" }, { "start": 813.88, "text": "course like every year would not be" }, { "start": 816.279, "text": "possible okay so now let's start with" }, { "start": 818.519, "text": "the the fun stuff and my favorite part" }, { "start": 820.8, "text": "of of the course which is the technical" }, { "start": 822.6, "text": "parts and let's start by just asking" }, { "start": 824.76, "text": "ourselves a question right which is you" }, { "start": 828.399, "text": "know why do we care about all of this" }, { "start": 830.279, "text": "why do we care about deep learning why" }, { "start": 831.639, "text": "did you all come here today to learn and" }, { "start": 834.079, "text": "to listen to this" }, { "start": 835.8, "text": "course so to understand I think we again" }, { "start": 838.72, "text": "need to go back a little bit to" }, { "start": 840.88, "text": "understand how machine learning used to" }, { "start": 842.68, "text": "be uh performed right so machine" }, { "start": 845.48, "text": "learning typically would Define a set of" }, { "start": 849.24, "text": "features or you can think of these as" }, { "start": 850.92, "text": "kind of a set of things to look for in" }, { "start": 853.839, "text": "an image or in a piece of data usually" }, { "start": 856.44, "text": "these are hand engineered so humans" }, { "start": 858.639, "text": "would have to Define these themselves" }, { "start": 861.24, "text": "and the problem with these is that they" }, { "start": 862.759, "text": "tend to be very brittle in practice just" }, { "start": 865.279, "text": "by nature of a human defining them so" }, { "start": 867.519, "text": "the key idea of keep learning and what" }, { "start": 869.8, "text": "you're going to learn throughout this" }, { "start": 871.079, "text": "entire week is this Paradigm Shift of" }, { "start": 873.56, "text": "trying to move away from hand" }, { "start": 875.199, "text": "engineering features and rules that" }, { "start": 877.839, "text": "computer should look for and instead" }, { "start": 879.72, "text": "trying to learn them directly from raw" }, { "start": 882.72, "text": "pieces of data so what are the patterns" }, { "start": 885.639, "text": "that we need to look at in data sets" }, { "start": 888.399, "text": "such that if we look at those patterns" }, { "start": 890.44, "text": "we can make some interesting decisions" }, { "start": 892.36, "text": "and interesting actions can come out so" }, { "start": 894.88, "text": "for example if we wanted to learn how to" }, { "start": 897.12, "text": "detect faces we might if you think even" }, { "start": 900.16, "text": "how you would detect faces right if you" }, { "start": 901.8, "text": "look at a picture what are you looking" }, { "start": 903.279, "text": "for to detect a face you're looking for" }, { "start": 905.16, "text": "some particular patterns you're looking" }, { "start": 907.0, "text": "for eyes and noses and ears and when" }, { "start": 909.639, "text": "those things are all composed in a" }, { "start": 911.16, "text": "certain way you would probably deduce" }, { "start": 913.16, "text": "that that's a face right computers do" }, { "start": 915.6, "text": "something very similar so they have to" }, { "start": 917.88, "text": "understand what are the patterns that" }, { "start": 919.6, "text": "they look for what are the eyes and" }, { "start": 921.24, "text": "noses and ears of those pieces of data" }, { "start": 924.48, "text": "and then from there actually detect and" }, { "start": 927.8, "text": "predict from them" }, { "start": 930.959, "text": "so the really interesting thing I think" }, { "start": 934.12, "text": "about deep learning is that these" }, { "start": 936.12, "text": "foundations for doing exactly what I" }, { "start": 938.44, "text": "just mentioned picking out the building" }, { "start": 940.6, "text": "blocks picking out the features from raw" }, { "start": 943.04, "text": "pieces of data and the underlying" }, { "start": 945.199, "text": "algorithms themselves have existed for" }, { "start": 947.6, "text": "many many decades now the question I" }, { "start": 952.199, "text": "would ask at this point is so why are we" }, { "start": 954.639, "text": "studying this now and why is all of this" }, { "start": 956.519, "text": "really blowing up right now and" }, { "start": 958.16, "text": "exploding with so many great advances" }, { "start": 960.44, "text": "well for one there's three things right" }, { "start": 962.639, "text": "number one is that the data that is" }, { "start": 964.56, "text": "available to us today is significantly" }, { "start": 967.839, "text": "more pervasive these models are hungry" }, { "start": 970.199, "text": "for data you're going to learn about" }, { "start": 971.68, "text": "this more in detail but these models are" }, { "start": 973.759, "text": "extremely hungry for data and we're" }, { "start": 975.92, "text": "living in a world right now quite" }, { "start": 978.88, "text": "frankly where data is more abundant than" }, { "start": 981.0, "text": "it has ever been in our history now" }, { "start": 983.959, "text": "secondly these algorithms are massively" }, { "start": 986.88, "text": "compute hungry they're and they're" }, { "start": 988.36, "text": "massively parallelizable which means" }, { "start": 990.6, "text": "that they have greatly benefited from" }, { "start": 993.72, "text": "compute Hardware which is also capable" }, { "start": 996.12, "text": "of being parallelized the particular" }, { "start": 999.319, "text": "name of that Hardware is called a GPU" }, { "start": 1001.68, "text": "right gpus can run parallel processing" }, { "start": 1004.6, "text": "uh streams of information and are" }, { "start": 1007.0, "text": "particularly amenable to deep learning" }, { "start": 1008.8, "text": "algorithms and the abundance of gpus and" }, { "start": 1011.279, "text": "that compute Hardware has also push" }, { "start": 1013.639, "text": "forward what we can do in deep learning" }, { "start": 1016.519, "text": "and finally the last piece is the" }, { "start": 1018.44, "text": "software" }, { "start": 1019.399, "text": "right it's the open source tools that" }, { "start": 1021.639, "text": "are really used as the foundational" }, { "start": 1024.52, "text": "building blocks of deploying and" }, { "start": 1026.88, "text": "building all of these underlying models" }, { "start": 1028.919, "text": "that you're going to learn about in this" }, { "start": 1030.28, "text": "course and those open source tools have" }, { "start": 1032.0, "text": "just become extremely streamlined making" }, { "start": 1034.24, "text": "this extremely easy for all of us to" }, { "start": 1037.16, "text": "learn about these Technologies within an" }, { "start": 1039.24, "text": "amazing onewe course like" }, { "start": 1041.52, "text": "this so let's start now with" }, { "start": 1044.12, "text": "understanding now that we have some of" }, { "start": 1045.439, "text": "the background let's start with" }, { "start": 1046.88, "text": "understanding exactly what is the" }, { "start": 1048.96, "text": "fundamental building block of a neural" }, { "start": 1051.28, "text": "network now that building block is" }, { "start": 1054.12, "text": "called a perceptron right every single" }, { "start": 1056.96, "text": "perceptor every single neural network is" }, { "start": 1058.96, "text": "built up of multiple perceptrons and" }, { "start": 1061.919, "text": "you're going to learn how those" }, { "start": 1063.48, "text": "perceptrons number one compute" }, { "start": 1065.16, "text": "information themselves and how they" }, { "start": 1066.64, "text": "connect to these much larger billion" }, { "start": 1069.24, "text": "parameter neural" }, { "start": 1071.2, "text": "networks so the key idea of a perceptron" }, { "start": 1074.4, "text": "or even simpler think of this as a" }, { "start": 1076.28, "text": "single neuron right so a neural network" }, { "start": 1078.28, "text": "is composed osed of many many neurons" }, { "start": 1080.72, "text": "and a perceptron is just one neuron so" }, { "start": 1083.48, "text": "that idea of a perceptron is actually" }, { "start": 1085.6, "text": "extremely simple and I hope that by the" }, { "start": 1087.12, "text": "end of today this idea and this uh" }, { "start": 1090.72, "text": "processing of a perceptron becomes" }, { "start": 1092.88, "text": "extremely clear to you so let's start by" }, { "start": 1095.159, "text": "talking about just the forward" }, { "start": 1096.96, "text": "propagation of information through a" }, { "start": 1099.28, "text": "single neuron now single neurons ingest" }, { "start": 1102.799, "text": "information they can actually ingest" }, { "start": 1105.08, "text": "multiple pieces of information so here" }, { "start": 1107.24, "text": "you can see this neuron taking has input" }, { "start": 1109.48, "text": "three pieces of information X1 X2 and" }, { "start": 1112.88, "text": "XM right so we Define the set of inputs" }, { "start": 1116.4, "text": "called x 1 through M and each of these" }, { "start": 1119.6, "text": "inputs each of these numbers is going to" }, { "start": 1121.679, "text": "be elementwise multiplied by a" }, { "start": 1124.12, "text": "particular weight so this is going to be" }, { "start": 1126.4, "text": "denoted here by W1 through WM so this is" }, { "start": 1129.24, "text": "a corresponding weight for every single" }, { "start": 1130.96, "text": "input and you should think of this as" }, { "start": 1132.6, "text": "really uh you know every weight being" }, { "start": 1134.96, "text": "assigned to that input right the weights" }, { "start": 1137.96, "text": "are part of the neuron itself now you" }, { "start": 1141.32, "text": "multiply all of these inputs with their" }, { "start": 1143.32, "text": "weights together and then you add them" }, { "start": 1144.88, "text": "up we take this single number after that" }, { "start": 1147.559, "text": "addition and you pass it through what's" }, { "start": 1149.679, "text": "called a nonlinear activation function" }, { "start": 1152.12, "text": "to produce your final output which here" }, { "start": 1154.039, "text": "be calling" }, { "start": 1158.159, "text": "y now what I just said is not entirely" }, { "start": 1161.84, "text": "correct right so I missed out one" }, { "start": 1163.799, "text": "critical piece of information that piece" }, { "start": 1165.52, "text": "of information is that we also have what" }, { "start": 1167.559, "text": "you can see here is called this bias" }, { "start": 1169.6, "text": "term that bias term is actually what" }, { "start": 1172.6, "text": "allows your neuron neuron to shift its" }, { "start": 1176.159, "text": "activation function horizontally on that" }, { "start": 1178.679, "text": "x axis if you think of it right so on" }, { "start": 1182.12, "text": "the right side you can now see this" }, { "start": 1183.799, "text": "diagram illustrating mathematically that" }, { "start": 1186.48, "text": "single equation that I talked through" }, { "start": 1188.559, "text": "kind of conceptually right now you can" }, { "start": 1190.159, "text": "see it mathematically written down as" }, { "start": 1191.96, "text": "one single equation and we can actually" }, { "start": 1194.28, "text": "rewrite this using linear algebra using" }, { "start": 1196.96, "text": "vectors and Dot products so let's do" }, { "start": 1199.28, "text": "that right so now our inputs are going" }, { "start": 1200.919, "text": "to be described by a capital x which is" }, { "start": 1203.96, "text": "simply a vector of all of our inputs X1" }, { "start": 1206.84, "text": "through XM and then our weights are" }, { "start": 1209.44, "text": "going to be described by a capital W" }, { "start": 1212.12, "text": "which is going to be uh W1 through WM" }, { "start": 1215.84, "text": "the input is obtained by taking the dot" }, { "start": 1218.159, "text": "product of X and W right that dot" }, { "start": 1221.799, "text": "product does that element wise" }, { "start": 1223.08, "text": "multiplication and then adds sums all of" }, { "start": 1226.0, "text": "the the element wise multiplications and" }, { "start": 1228.48, "text": "then here's the missing piece is that" }, { "start": 1230.36, "text": "we're now going to add that bias term" }, { "start": 1232.799, "text": "here we're calling the bias term" }, { "start": 1234.72, "text": "w0 right and then we're going to apply" }, { "start": 1236.919, "text": "the nonlinearity which here denoted as Z" }, { "start": 1239.52, "text": "or G excuse me so I've mentioned this" }, { "start": 1242.84, "text": "nonlinearity a few times this activation" }, { "start": 1245.039, "text": "function let's dig into it a little bit" }, { "start": 1247.039, "text": "more so we can understand what is" }, { "start": 1248.88, "text": "actually this activation function doing" }, { "start": 1251.48, "text": "well I said a couple things about it I" }, { "start": 1253.36, "text": "said it's a nonlinear function right" }, { "start": 1255.679, "text": "here you can see one example of an" }, { "start": 1257.96, "text": "activation fun function one common uh" }, { "start": 1261.24, "text": "one commonly used activation function is" }, { "start": 1263.96, "text": "called the sigmoid function which you" }, { "start": 1265.72, "text": "can actually see here on the bottom" }, { "start": 1267.159, "text": "right hand side of the screen the" }, { "start": 1268.919, "text": "sigmoid function is very commonly used" }, { "start": 1271.679, "text": "because it's outputs right so it takes" }, { "start": 1274.039, "text": "as input any real number the x- axxis is" }, { "start": 1276.559, "text": "infinite plus or minus but on the Y AIS" }, { "start": 1280.039, "text": "it basically squashes every input X into" }, { "start": 1284.4, "text": "a number between Z and one so it's" }, { "start": 1286.48, "text": "actually a very common choice for things" }, { "start": 1288.24, "text": "like probability distributions if you" }, { "start": 1290.0, "text": "want to convert your answers into" }, { "start": 1291.559, "text": "probabilities or learn or teach a neuron" }, { "start": 1294.32, "text": "to learn a probability" }, { "start": 1296.44, "text": "distribution but in fact there are" }, { "start": 1298.52, "text": "actually many different types of" }, { "start": 1299.88, "text": "nonlinear activation functions that are" }, { "start": 1302.24, "text": "used in neural networks and here are" }, { "start": 1303.919, "text": "some common ones and and again" }, { "start": 1305.4, "text": "throughout this presentation you'll see" }, { "start": 1307.4, "text": "these little tensorflow icons actually" }, { "start": 1309.84, "text": "throughout the entire course you'll see" }, { "start": 1311.039, "text": "these tensorflow icons on the bottom" }, { "start": 1313.12, "text": "which basically just allow you to uh" }, { "start": 1315.919, "text": "relate some of the foundational" }, { "start": 1317.64, "text": "knowledge that we're teaching ing in the" }, { "start": 1319.36, "text": "lectures to some of the software labs" }, { "start": 1321.48, "text": "and this might provide a good starting" }, { "start": 1323.12, "text": "point for a lot of the pieces that you" }, { "start": 1324.559, "text": "have to do later on in the software" }, { "start": 1326.76, "text": "parts of the class so the sigmoid" }, { "start": 1329.4, "text": "activation which we talked about in the" }, { "start": 1331.0, "text": "last slide here it's shown on the left" }, { "start": 1332.48, "text": "hand side right this is very popular" }, { "start": 1334.679, "text": "because of the probability distributions" }, { "start": 1336.32, "text": "right it squashes everything between" }, { "start": 1337.679, "text": "zero and one but you see two other uh" }, { "start": 1340.48, "text": "very common types of activation" }, { "start": 1342.64, "text": "functions in the middle and the right" }, { "start": 1344.32, "text": "hand side as well so the other very very" }, { "start": 1347.039, "text": "common one probably the this is the one" }, { "start": 1349.08, "text": "now that's the most popular activation" }, { "start": 1350.84, "text": "function is now on the far right hand" }, { "start": 1352.64, "text": "side it's called the relu activation" }, { "start": 1354.919, "text": "function or also called the rectified" }, { "start": 1356.72, "text": "linear unit so basically it's linear" }, { "start": 1359.08, "text": "everywhere except there's a nonlinearity" }, { "start": 1361.279, "text": "at x equals z so there's a kind of a" }, { "start": 1364.039, "text": "step or a break discontinuity right so" }, { "start": 1366.96, "text": "benefit of this very easy to compute it" }, { "start": 1369.44, "text": "still has the nonlinearity which we kind" }, { "start": 1371.44, "text": "of need and we'll talk about why we need" }, { "start": 1372.96, "text": "it in one second but it's very fast" }, { "start": 1375.72, "text": "right just two linear functions" }, { "start": 1377.32, "text": "piecewise combined with each" }, { "start": 1379.44, "text": "other okay so now let's talk about why" }, { "start": 1381.72, "text": "we need a nonlinearity in the first" }, { "start": 1383.72, "text": "place why why not just deal with a" }, { "start": 1386.12, "text": "linear function that we pass all of" }, { "start": 1387.679, "text": "these inputs through so the point of the" }, { "start": 1390.039, "text": "activation function even at all why do" }, { "start": 1392.799, "text": "we have this is to introduce" }, { "start": 1395.279, "text": "nonlinearities in of itself so what we" }, { "start": 1398.6, "text": "want to do is to allow our neural" }, { "start": 1401.2, "text": "network to deal with nonlinear data" }, { "start": 1404.64, "text": "right our neural networks need the" }, { "start": 1406.76, "text": "ability to deal with nonlinear data" }, { "start": 1408.72, "text": "because the world is extremely nonlinear" }, { "start": 1412.4, "text": "right this is important because you know" }, { "start": 1414.559, "text": "if you think of the real world real data" }, { "start": 1416.679, "text": "sets this is just the way they are right" }, { "start": 1419.4, "text": "if you look at data sets like this one" }, { "start": 1421.24, "text": "green and red points right and I ask you" }, { "start": 1423.279, "text": "to build a neural network that can" }, { "start": 1425.76, "text": "separate the green and the red points" }, { "start": 1428.559, "text": "this means that we actually need a" }, { "start": 1431.2, "text": "nonlinear function to do that we cannot" }, { "start": 1432.96, "text": "solve this problem with a single line" }, { "start": 1435.88, "text": "right in fact if we used linear uh" }, { "start": 1439.559, "text": "linear functions as your activation" }, { "start": 1441.679, "text": "function no matter how big your neural" }, { "start": 1443.72, "text": "network is it's still a linear function" }, { "start": 1445.919, "text": "because linear functions combined with" }, { "start": 1447.36, "text": "linear functions are still linear so no" }, { "start": 1449.96, "text": "matter how deep or how many parameters" }, { "start": 1451.72, "text": "your neural network has the best they" }, { "start": 1453.64, "text": "would be able to do to separate these" }, { "start": 1455.24, "text": "green and red points would look like" }, { "start": 1456.679, "text": "this but adding nonlinearities allows" }, { "start": 1459.64, "text": "our neural networks to be smaller by" }, { "start": 1462.48, "text": "allowing them to be more expressive and" }, { "start": 1464.64, "text": "capture more complexities in the data" }, { "start": 1466.919, "text": "sets and this allows them to be much" }, { "start": 1468.6, "text": "more powerful in the end so let's" }, { "start": 1472.12, "text": "understand this with a simple example" }, { "start": 1474.0, "text": "imagine I give you now this trained" }, { "start": 1475.76, "text": "neural network so what does it mean" }, { "start": 1476.96, "text": "trained neural network it means now I'm" }, { "start": 1478.44, "text": "giving you the weights right not only" }, { "start": 1480.52, "text": "the inputs but I'm going to tell you" }, { "start": 1482.279, "text": "what the weights of this neural network" }, { "start": 1483.64, "text": "are so here let's say the bias term w0" }, { "start": 1487.279, "text": "is going to be one and our W Vector is" }, { "start": 1490.799, "text": "going to be 3 and ne2 right these are" }, { "start": 1493.76, "text": "just the weights of your train neural" }, { "start": 1494.96, "text": "network let's worry about how we got" }, { "start": 1496.679, "text": "those weights in a second but this" }, { "start": 1498.799, "text": "network has two inputs X1 and X2 now if" }, { "start": 1503.36, "text": "we want to get the output of this neural" }, { "start": 1505.88, "text": "network all we have to do simply is to" }, { "start": 1508.52, "text": "do the same story that we talked about" }, { "start": 1510.12, "text": "before right it's dot" }, { "start": 1512.919, "text": "product inputs with weights add the bias" }, { "start": 1517.48, "text": "and apply the nonlinearity right and" }, { "start": 1519.24, "text": "those are the three components that you" }, { "start": 1520.72, "text": "really have to remember as part of this" }, { "start": 1522.64, "text": "class right dot product uh add the bias" }, { "start": 1526.64, "text": "and apply a nonlinearity that's going to" }, { "start": 1528.799, "text": "be the process that keeps repeating over" }, { "start": 1530.48, "text": "and over and over again for every single" }, { "start": 1532.799, "text": "neuron after that happens that neuron" }, { "start": 1535.679, "text": "was going to Output a single number" }, { "start": 1538.24, "text": "right now let's take a look at what's" }, { "start": 1540.159, "text": "inside of that nonlinearity it's simply" }, { "start": 1542.88, "text": "a weighted combination of those uh of" }, { "start": 1547.399, "text": "those inputs with those weights right so" }, { "start": 1549.24, "text": "if we look at what's inside of G right" }, { "start": 1552.399, "text": "inside of G is a weighted combination of" }, { "start": 1554.72, "text": "X and" }, { "start": 1555.72, "text": "W right added with a bias" }, { "start": 1558.919, "text": "right that's going to produce a single" }, { "start": 1561.52, "text": "number right but in reality for any" }, { "start": 1564.12, "text": "input that this model could see what" }, { "start": 1566.48, "text": "this really is is a two-dimensional line" }, { "start": 1568.52, "text": "because we have two parameters in this" }, { "start": 1571.039, "text": "model so we can actually plot that line" }, { "start": 1574.12, "text": "we can see exactly how this neuron" }, { "start": 1578.0, "text": "separates points on these axes between" }, { "start": 1581.32, "text": "X1 and X2 right these are the two inputs" }, { "start": 1583.84, "text": "of this model we can see exactly and" }, { "start": 1586.559, "text": "interpret exactly what this neuron is is" }, { "start": 1588.48, "text": "doing right we can visualize its entire" }, { "start": 1590.679, "text": "space because we can plot the line that" }, { "start": 1593.0, "text": "defines this neuron right so here we're" }, { "start": 1595.559, "text": "plotting when that line equals" }, { "start": 1597.72, "text": "zero and in fact if I give you if I give" }, { "start": 1601.279, "text": "that neuron in fact a new data point" }, { "start": 1603.72, "text": "here the new data point is X1 = -1 and" }, { "start": 1606.559, "text": "X2 = 2 just an arbitrary point in this" }, { "start": 1609.2, "text": "two-dimensional space we can plot that" }, { "start": 1611.32, "text": "point in the two-dimensional space And" }, { "start": 1613.24, "text": "depending on which side of the line it" }, { "start": 1615.0, "text": "falls on it tells us you know what the" }, { "start": 1618.36, "text": "what the answer is going to be what the" }, { "start": 1619.919, "text": "sign of the answer is going to be and" }, { "start": 1622.0, "text": "also what the answer itself is going to" }, { "start": 1623.799, "text": "be right so if we follow that that" }, { "start": 1625.96, "text": "equation written on the top here and" }, { "start": 1627.88, "text": "plug in -1 and 2 we're going to get 1 -" }, { "start": 1631.279, "text": "3 - 4 which equal" }, { "start": 1634.44, "text": "-6 right and when I put that into my" }, { "start": 1637.36, "text": "nonlinearity G I'm going to get a final" }, { "start": 1640.559, "text": "output of" }, { "start": 1643.12, "text": "0.2 right so that that don't worry about" }, { "start": 1645.64, "text": "the final output that's just going to be" }, { "start": 1647.039, "text": "the output for that signal function but" }, { "start": 1649.52, "text": "the important point to remember here is" }, { "start": 1651.88, "text": "that the sigmoid function actually" }, { "start": 1653.52, "text": "divides the space into these two parts" }, { "start": 1656.799, "text": "right it squashes everything between Z" }, { "start": 1659.08, "text": "and one but it divides it implicitly by" }, { "start": 1662.279, "text": "everything less than 0.5 and greater" }, { "start": 1665.159, "text": "than 0.5 depending on if it's on if x is" }, { "start": 1668.279, "text": "less than zero or greater than zero so" }, { "start": 1671.159, "text": "depending on which side of the line that" }, { "start": 1673.08, "text": "you fall on remember the line is when x" }, { "start": 1675.76, "text": "equals z the input to the sigmoid is" }, { "start": 1677.64, "text": "zero if you fall on the left side of the" }, { "start": 1680.159, "text": "line your output will be less than 0.5" }, { "start": 1684.08, "text": "because you're falling on the negative" }, { "start": 1685.72, "text": "side of the line if your output is if" }, { "start": 1688.2, "text": "your input is on the right side of the" }, { "start": 1689.88, "text": "line now your output is going to be" }, { "start": 1692.84, "text": "greater than" }, { "start": 1694.279, "text": "0.5 right so here we can actually" }, { "start": 1696.679, "text": "visualize this space this is called the" }, { "start": 1698.72, "text": "feature space of a neural network we can" }, { "start": 1701.2, "text": "visualize it in its completion right we" }, { "start": 1704.08, "text": "can totally visualize and interpret this" }, { "start": 1706.08, "text": "neural network we can understand exactly" }, { "start": 1708.24, "text": "what it's going to do for any input that" }, { "start": 1710.36, "text": "it sees right but of course this is a" }, { "start": 1712.88, "text": "very simple neuron right it's not a" }, { "start": 1714.6, "text": "neural network it's just one neuron and" }, { "start": 1716.84, "text": "even more than that it's even a very" }, { "start": 1718.519, "text": "simple neuron it only has two inputs" }, { "start": 1721.08, "text": "right so in reality the types of neuron" }, { "start": 1724.24, "text": "neurons that you're going to be dealing" }, { "start": 1725.64, "text": "with in this course are going to be" }, { "start": 1727.64, "text": "neurons and neural networks with" }, { "start": 1730.32, "text": "millions or even billions of these" }, { "start": 1732.84, "text": "parameters right of these inputs right" }, { "start": 1735.2, "text": "so here we only have two weights W1 W2" }, { "start": 1738.24, "text": "but today's neural networks have" }, { "start": 1739.84, "text": "billions of these parameters so drawing" }, { "start": 1742.679, "text": "these types of plots that you see here" }, { "start": 1745.6, "text": "obviously becomes a lot more challenging" }, { "start": 1747.679, "text": "it's actually not" }, { "start": 1749.919, "text": "possible but now that we have some of" }, { "start": 1751.96, "text": "the intuition behind a perceptron let's" }, { "start": 1754.6, "text": "start now by building neural networks" }, { "start": 1757.559, "text": "and seeing how all of this comes" }, { "start": 1759.44, "text": "together so let's revisit that previous" }, { "start": 1761.679, "text": "diagram of a perceptron now again if" }, { "start": 1764.6, "text": "there's only one thing to take away from" }, { "start": 1766.799, "text": "this lecture right now it's to remember" }, { "start": 1769.799, "text": "how a perceptron works that equation of" }, { "start": 1772.279, "text": "a perceptron is extremely important for" }, { "start": 1774.32, "text": "every single class that comes after" }, { "start": 1775.799, "text": "today and there's only three steps it's" }, { "start": 1778.32, "text": "dot product with the inputs add a bias" }, { "start": 1781.6, "text": "and apply your" }, { "start": 1783.24, "text": "nonlinearity let's simplify the diagram" }, { "start": 1785.519, "text": "a little bit I'll remove the weight" }, { "start": 1787.72, "text": "labels from this picture and now you can" }, { "start": 1790.32, "text": "assume that if I show a line every" }, { "start": 1792.72, "text": "single line has an Associated weight" }, { "start": 1795.36, "text": "that comes with that line right I'll" }, { "start": 1797.88, "text": "also also remove the bias term for" }, { "start": 1799.559, "text": "Simplicity assume that every neuron has" }, { "start": 1801.799, "text": "that bias term I don't need to show it" }, { "start": 1804.159, "text": "and now note that the result here now" }, { "start": 1807.279, "text": "calling it Z which is just the uh dot" }, { "start": 1810.44, "text": "product plus bias before the" }, { "start": 1813.0, "text": "nonlinearity is the output is going to" }, { "start": 1815.88, "text": "be linear first of all it's just a it's" }, { "start": 1817.64, "text": "just a weighted sum of all those pieces" }, { "start": 1819.48, "text": "we have not applied the nonlinearity yet" }, { "start": 1821.76, "text": "but our final output is just going to be" }, { "start": 1824.48, "text": "G of Z it's the activation function or" }, { "start": 1827.159, "text": "nonlinear activ function applied to" }, { "start": 1830.799, "text": "Z now if we want to step this up a" }, { "start": 1833.799, "text": "little bit more and say what if we had a" }, { "start": 1837.72, "text": "multi-output function now we don't just" }, { "start": 1839.88, "text": "have one output but let's say we want to" }, { "start": 1841.48, "text": "have two outputs well now we can just" }, { "start": 1843.48, "text": "have two neurons in this network right" }, { "start": 1846.84, "text": "every neuron say sees all of the inputs" }, { "start": 1849.76, "text": "that came before it but now you see the" }, { "start": 1852.2, "text": "top neuron is going to be predicting an" }, { "start": 1854.76, "text": "answer and the bottom neuron will" }, { "start": 1856.12, "text": "predict its own answer now importantly" }, { "start": 1858.159, "text": "one thing you should really notice here" }, { "start": 1859.519, "text": "is that each neuron has its own weights" }, { "start": 1863.519, "text": "right each neuron has its own lines that" }, { "start": 1865.639, "text": "are coming into just that neuron right" }, { "start": 1867.96, "text": "so they're acting independently but they" }, { "start": 1870.08, "text": "can later on communicate if you have" }, { "start": 1872.039, "text": "another" }, { "start": 1873.24, "text": "layer" }, { "start": 1876.24, "text": "right so let's start now by initializing" }, { "start": 1880.32, "text": "this uh this process a bit further and" }, { "start": 1883.639, "text": "thinking about it more programmatically" }, { "start": 1885.679, "text": "right what if we wanted to program this" }, { "start": 1887.919, "text": "this neural network ourselves from" }, { "start": 1890.2, "text": "scratch right remember that equation I" }, { "start": 1891.96, "text": "told you it didn't sound very complex" }, { "start": 1893.639, "text": "it's take a DOT product add a bias which" }, { "start": 1896.32, "text": "is a single number and apply" }, { "start": 1898.08, "text": "nonlinearity let's see how we would" }, { "start": 1899.6, "text": "actually Implement something like that" }, { "start": 1901.44, "text": "so to to define the layer right we're" }, { "start": 1904.12, "text": "now going to call this a layer uh which" }, { "start": 1906.639, "text": "is a collection of neurons right we have" }, { "start": 1910.799, "text": "to first Define how that information" }, { "start": 1913.36, "text": "propagates through the network so we can" }, { "start": 1915.639, "text": "do that by creating a call function here" }, { "start": 1918.0, "text": "first we're going to actually Define the" }, { "start": 1919.76, "text": "weights for that Network right so" }, { "start": 1922.159, "text": "remember every Network every neuron I" }, { "start": 1924.519, "text": "should say every neuron has weights and" }, { "start": 1926.679, "text": "a bias right so let's define those first" }, { "start": 1929.84, "text": "we're going to create the call function" }, { "start": 1931.799, "text": "to actually see how we can pass" }, { "start": 1935.12, "text": "information through that layer right so" }, { "start": 1938.2, "text": "this is going to take us input and" }, { "start": 1939.76, "text": "inputs right this is like what we" }, { "start": 1941.639, "text": "previously called X and it's the same" }, { "start": 1944.679, "text": "story that we've been seeing this whole" }, { "start": 1946.44, "text": "class right we're going to Matrix" }, { "start": 1948.76, "text": "multiply or take a DOT product of our" }, { "start": 1950.679, "text": "inputs with our" }, { "start": 1952.159, "text": "weights we're going to add a bias and" }, { "start": 1955.279, "text": "then we're going to apply a nonlinearity" }, { "start": 1957.639, "text": "it's really that simple right we've now" }, { "start": 1959.919, "text": "created a single layer neural" }, { "start": 1963.639, "text": "network right so this this line in" }, { "start": 1966.559, "text": "particular this is the part that allows" }, { "start": 1968.279, "text": "us to" }, { "start": 1969.519, "text": "be a powerful neural network maintaining" }, { "start": 1972.559, "text": "that" }, { "start": 1973.559, "text": "nonlinearity and the important thing" }, { "start": 1976.12, "text": "here is to note that" }, { "start": 1979.0, "text": "modern deep learning toolboxes and" }, { "start": 1981.24, "text": "libraries already Implement a lot of" }, { "start": 1983.36, "text": "these for you right so it's important" }, { "start": 1985.2, "text": "for you to understand the foundations" }, { "start": 1987.32, "text": "but in practice all of that layer" }, { "start": 1990.0, "text": "architecture and all that layer logic is" }, { "start": 1992.639, "text": "actually implemented in tools like" }, { "start": 1994.799, "text": "tensorflow and P torch through a dense" }, { "start": 1997.32, "text": "layer right so here you can see an" }, { "start": 1998.799, "text": "example of calling or creating" }, { "start": 2002.0, "text": "initializing a dense layer with two" }, { "start": 2005.84, "text": "neurons right allowing it to feed in an" }, { "start": 2008.96, "text": "arbitrary set of inputs here we're" }, { "start": 2010.639, "text": "seeing these two neurons in a layer" }, { "start": 2013.12, "text": "being fed three inputs right and in code" }, { "start": 2016.32, "text": "it's only reduced down to this one line" }, { "start": 2018.72, "text": "of tensorflow code making it extremely" }, { "start": 2020.679, "text": "easy and convenient for us to use these" }, { "start": 2023.559, "text": "functions and call them so now let's" }, { "start": 2026.159, "text": "look at our single layered neural" }, { "start": 2028.08, "text": "network this is where we have now one" }, { "start": 2030.519, "text": "layer between our input and our outputs" }, { "start": 2033.639, "text": "right so we're slowly and progressively" }, { "start": 2036.039, "text": "increasing the complexity of our neural" }, { "start": 2038.2, "text": "network so that we can build up all of" }, { "start": 2039.84, "text": "these building blocks right this layer" }, { "start": 2043.48, "text": "in the middle is called a hidden layer" }, { "start": 2046.44, "text": "right obviously because you don't" }, { "start": 2047.679, "text": "directly observe it you don't directly" }, { "start": 2049.24, "text": "supervise it right you do observe the" }, { "start": 2051.839, "text": "two input and output layers but your" }, { "start": 2053.599, "text": "hidden layer is just kind of a uh a" }, { "start": 2056.159, "text": "neuron neuron layer that you don't" }, { "start": 2058.599, "text": "directly observe right it just gives" }, { "start": 2060.28, "text": "your network more capacity more learning" }, { "start": 2063.72, "text": "complexity and since we now have a" }, { "start": 2065.599, "text": "transformation function from inputs to" }, { "start": 2068.0, "text": "Hidden layers and hidden layers to" }, { "start": 2070.159, "text": "Output we now have a two- layered neural" }, { "start": 2073.24, "text": "network right which means that we also" }, { "start": 2076.2, "text": "have two weight matrices right we don't" }, { "start": 2078.839, "text": "have just the W1 which we previously had" }, { "start": 2081.72, "text": "to create this hidden layer but now we" }, { "start": 2083.28, "text": "also have W2 which does the" }, { "start": 2085.04, "text": "transformation from hidden layer to" }, { "start": 2086.44, "text": "Output layer yes what happens" }, { "start": 2088.96, "text": "nonlinearity in Hidden you have just" }, { "start": 2091.04, "text": "linear so there's no it's not is it a" }, { "start": 2093.52, "text": "perceptron or not yes so every hidden" }, { "start": 2096.32, "text": "layer also has an nonlinearity" }, { "start": 2098.64, "text": "accompanied with it right and that's a" }, { "start": 2100.4, "text": "very important point because if you" }, { "start": 2101.72, "text": "don't have that perceptron then it's" }, { "start": 2103.56, "text": "just a very large linear function" }, { "start": 2105.68, "text": "followed by a final nonlinearity at the" }, { "start": 2107.64, "text": "very end right so you need that" }, { "start": 2109.8, "text": "cascading and uh you know overlapping" }, { "start": 2113.24, "text": "application of nonlinearities that occur" }, { "start": 2115.839, "text": "throughout the" }, { "start": 2117.599, "text": "network" }, { "start": 2119.56, "text": "awesome okay so now let's zoom in look" }, { "start": 2122.88, "text": "at a single unit in the hidden layer" }, { "start": 2125.28, "text": "take this one for example let's call it" }, { "start": 2127.079, "text": "Z2 right it's the second neuron in the" }, { "start": 2129.4, "text": "first layer right it's the same" }, { "start": 2131.72, "text": "perception that we saw before we compute" }, { "start": 2134.2, "text": "its answer by taking a DOT product of" }, { "start": 2136.599, "text": "its weights with its inputs adding a" }, { "start": 2139.56, "text": "bias and then applying a nonlinearity if" }, { "start": 2142.32, "text": "we took a different hidden nodee like Z3" }, { "start": 2145.2, "text": "the one right below it we would compute" }, { "start": 2147.48, "text": "its answer exactly the same way that we" }, { "start": 2149.119, "text": "computed Z2 except its weights would be" }, { "start": 2151.76, "text": "different than the weights of Z2" }, { "start": 2153.24, "text": "everything else stays exactly the same" }, { "start": 2154.839, "text": "it sees the same inputs but of course" }, { "start": 2157.2, "text": "you know I'm not going to actually show" }, { "start": 2158.599, "text": "Z3 in this picture and now this picture" }, { "start": 2161.2, "text": "is getting a little bit messy so let's" }, { "start": 2162.72, "text": "clean things up a little bit more I'm" }, { "start": 2164.119, "text": "going to remove all the lines now and" }, { "start": 2165.92, "text": "replace them just with these these boxes" }, { "start": 2168.48, "text": "these symbols that will denote what we" }, { "start": 2171.079, "text": "call a fully connected layer right so" }, { "start": 2173.16, "text": "these layers now denote that everything" }, { "start": 2175.359, "text": "in our input is connected to everything" }, { "start": 2176.92, "text": "in our output and the transformation is" }, { "start": 2179.0, "text": "exactly as we saw before dot product" }, { "start": 2181.28, "text": "bias and" }, { "start": 2184.599, "text": "nonlinearity and again in code to do" }, { "start": 2187.24, "text": "this is extremely straightforward with" }, { "start": 2189.0, "text": "the foundations that we've built up from" }, { "start": 2190.76, "text": "the beginning of the class we can now" }, { "start": 2192.8, "text": "just Define two of these dense layers" }, { "start": 2195.4, "text": "right our hidden layer on line one with" }, { "start": 2197.68, "text": "n hidden units and then our output layer" }, { "start": 2200.839, "text": "with two hidden output units does that" }, { "start": 2203.359, "text": "mean the nonlinearity function must be" }, { "start": 2205.079, "text": "the same between layers nonlinearity" }, { "start": 2207.599, "text": "function does not need to be the same" }, { "start": 2208.96, "text": "through through each layer often times" }, { "start": 2211.24, "text": "it is because of convenience there's" }, { "start": 2214.64, "text": "there are some cases where you would" }, { "start": 2216.079, "text": "want it to be different as well" }, { "start": 2218.0, "text": "especially in lecture two you're going" }, { "start": 2220.079, "text": "to see nonlinearities be different even" }, { "start": 2222.359, "text": "within the same layer um let alone" }, { "start": 2225.2, "text": "different layers but uh unless for a" }, { "start": 2229.2, "text": "particular reason generally convention" }, { "start": 2230.92, "text": "is there's no need to keep them" }, { "start": 2234.04, "text": "differently now let's keep expanding our" }, { "start": 2237.2, "text": "knowledge a little bit more if we now" }, { "start": 2238.599, "text": "want to make a deep neural network not" }, { "start": 2240.48, "text": "just a neural network like we saw in the" }, { "start": 2242.64, "text": "previous side now it's deep all that" }, { "start": 2244.28, "text": "means is that we're now going to stack" }, { "start": 2246.359, "text": "these layers on top of each other one by" }, { "start": 2248.319, "text": "one more and more creating a" }, { "start": 2250.56, "text": "hierarchical model right the ones where" }, { "start": 2253.2, "text": "the final output is now going to be" }, { "start": 2255.52, "text": "computed by going deeper and deeper and" }, { "start": 2257.52, "text": "deeper into the neural network and again" }, { "start": 2261.28, "text": "doing this in code again follows the" }, { "start": 2263.56, "text": "exact same story as before just" }, { "start": 2265.24, "text": "cascading these tensorflow layers on top" }, { "start": 2268.359, "text": "of each other and just going deeper into" }, { "start": 2270.68, "text": "the" }, { "start": 2272.4, "text": "network okay so now this is great" }, { "start": 2275.0, "text": "because now we have at least a solid" }, { "start": 2276.96, "text": "foundational understanding of how to not" }, { "start": 2279.28, "text": "only Define a single neuron but how to" }, { "start": 2281.319, "text": "define an entire neural network and you" }, { "start": 2283.0, "text": "should be able to actually explain at" }, { "start": 2284.76, "text": "this point or understand how information" }, { "start": 2287.4, "text": "goes from input through an entire neural" }, { "start": 2290.68, "text": "network to compute an output so now" }, { "start": 2293.68, "text": "let's look at how we can apply these" }, { "start": 2295.44, "text": "neural networks to solve a very real" }, { "start": 2298.2, "text": "problem that uh I'm sure all of you care" }, { "start": 2300.52, "text": "about so here's a problem on how we want" }, { "start": 2302.839, "text": "to build an AI system to learn to answer" }, { "start": 2305.24, "text": "the following question which is will I" }, { "start": 2307.92, "text": "pass this class right I'm sure all of" }, { "start": 2310.079, "text": "you are really worried about this" }, { "start": 2312.52, "text": "question um so to do this let's start" }, { "start": 2315.359, "text": "with a simple input feature model the" }, { "start": 2318.28, "text": "feature the two features that let's" }, { "start": 2320.48, "text": "concern ourselves with are going to be" }, { "start": 2322.24, "text": "number one how many lectures you attend" }, { "start": 2325.56, "text": "and number two how many hours you spend" }, { "start": 2329.2, "text": "on your final" }, { "start": 2330.599, "text": "project so let's look at some of the" }, { "start": 2333.599, "text": "past years of this class right we can" }, { "start": 2335.64, "text": "actually observe how different people" }, { "start": 2338.48, "text": "have uh lived in this space right" }, { "start": 2341.64, "text": "between how many lectures and how much" }, { "start": 2343.44, "text": "time You' spent on your final project" }, { "start": 2345.319, "text": "and you can actually see every point is" }, { "start": 2347.2, "text": "a person the color of that point is" }, { "start": 2349.599, "text": "going to be if they passed or failed the" }, { "start": 2351.2, "text": "class and you can see and visualize kind" }, { "start": 2353.76, "text": "of this V this feature space if you will" }, { "start": 2356.64, "text": "that we talked about before and then we" }, { "start": 2358.4, "text": "have you you fall right here you're the" }, { "start": 2360.839, "text": "point" }, { "start": 2361.88, "text": "45 uh right in between the the this uh" }, { "start": 2365.92, "text": "feature space you've attended four" }, { "start": 2368.119, "text": "lectures and you will spend 5 hours on" }, { "start": 2370.04, "text": "the final project and you want to build" }, { "start": 2372.0, "text": "a neural network to determine given" }, { "start": 2374.68, "text": "everyone else in the class right that" }, { "start": 2376.88, "text": "I've seen from all of the previous years" }, { "start": 2379.2, "text": "you want to help you want to have your" }, { "start": 2381.04, "text": "neural network help you to understand" }, { "start": 2383.599, "text": "what is your likelihood that you will" }, { "start": 2386.24, "text": "pass or fail this class so let's do it" }, { "start": 2389.119, "text": "we now have all of the building blocks" }, { "start": 2390.68, "text": "to solve this problem using a neural" }, { "start": 2392.28, "text": "network let's do it so we have two" }, { "start": 2394.319, "text": "inputs those inputs are the number of" }, { "start": 2396.4, "text": "lectures you attend and number of hours" }, { "start": 2398.44, "text": "you spend on your final project it's" }, { "start": 2400.599, "text": "four and five we can pass those two" }, { "start": 2402.16, "text": "inputs to our two uh X1 and X2 variables" }, { "start": 2407.04, "text": "these are fed into this single layered" }, { "start": 2410.04, "text": "single hidden layered neural network it" }, { "start": 2412.96, "text": "has three hidden units in the middle and" }, { "start": 2415.319, "text": "we can see that the final predicted" }, { "start": 2417.04, "text": "output probability for you to pass this" }, { "start": 2419.2, "text": "class is 0.1 or 10% right so very Bleak" }, { "start": 2423.2, "text": "outcome it's not a good outcome um the" }, { "start": 2427.04, "text": "actual ual probability is one right so" }, { "start": 2430.8, "text": "attending four out of the five lectures" }, { "start": 2432.359, "text": "and spending 5 hours in your final" }, { "start": 2433.92, "text": "project you actually lived in a part of" }, { "start": 2435.52, "text": "the feature space which was actually" }, { "start": 2436.92, "text": "very positive right it looked like you" }, { "start": 2438.24, "text": "were going to pass the class so what" }, { "start": 2439.8, "text": "happened here anyone have any ideas so" }, { "start": 2441.92, "text": "why did the neural network get this so" }, { "start": 2443.68, "text": "terribly wrong right it's not trained" }, { "start": 2446.92, "text": "exactly so this neural network is not" }, { "start": 2448.44, "text": "trained we haven't shown any of that" }, { "start": 2450.76, "text": "data the green and red data right so you" }, { "start": 2453.72, "text": "should really think of neural networks" }, { "start": 2455.76, "text": "like babies right before they see data" }, { "start": 2458.72, "text": "they haven't learned anything there's no" }, { "start": 2460.96, "text": "expectation that we should have for them" }, { "start": 2462.92, "text": "to be able to solve any of these types" }, { "start": 2464.359, "text": "of problems before we teach them" }, { "start": 2465.96, "text": "something about the world so let's teach" }, { "start": 2468.24, "text": "this neural network something about uh" }, { "start": 2470.44, "text": "the problem first right and to train it" }, { "start": 2472.599, "text": "we first need to tell our neural network" }, { "start": 2475.92, "text": "when it's making bad decisions right so" }, { "start": 2478.359, "text": "we need to teach it right really train" }, { "start": 2480.56, "text": "it to learn exactly like how we as" }, { "start": 2482.92, "text": "humans learn in some ways right so we" }, { "start": 2484.96, "text": "have to inform the neural network when" }, { "start": 2486.96, "text": "it gets the answer incorrect so that it" }, { "start": 2489.16, "text": "can learn how to get the answer correct" }, { "start": 2492.28, "text": "right so the closer the answer is to the" }, { "start": 2495.359, "text": "ground truth so right so for example the" }, { "start": 2497.76, "text": "actual value for you passing this class" }, { "start": 2500.04, "text": "was probability one 100% but it" }, { "start": 2502.88, "text": "predicted a probability of" }, { "start": 2504.76, "text": "0.1 we compute what's called a loss" }, { "start": 2507.76, "text": "right so the closer these two things are" }, { "start": 2509.72, "text": "together the smaller your loss should be" }, { "start": 2512.319, "text": "and the and the more accurate your model" }, { "start": 2514.359, "text": "should" }, { "start": 2515.76, "text": "be so let's assume that we have data not" }, { "start": 2518.76, "text": "just from one student but now we have" }, { "start": 2521.119, "text": "data from many students we many students" }, { "start": 2523.28, "text": "have taken this class before and we can" }, { "start": 2524.64, "text": "plug all of them into the neural network" }, { "start": 2526.119, "text": "and show them all to this to this system" }, { "start": 2528.72, "text": "now we care not only about how the" }, { "start": 2530.76, "text": "neural network did on just this one" }, { "start": 2532.68, "text": "prediction but we care about how it" }, { "start": 2534.76, "text": "predicted on all of these different" }, { "start": 2536.72, "text": "people that the neural network has shown" }, { "start": 2538.839, "text": "in the past as well during this training" }, { "start": 2541.2, "text": "and learning process so when training" }, { "start": 2543.559, "text": "the neural network we want to find a" }, { "start": 2545.119, "text": "network that minimizes the empirical" }, { "start": 2549.04, "text": "loss between our predictions and those" }, { "start": 2552.16, "text": "ground truth outputs and we're going to" }, { "start": 2553.68, "text": "do this on average across all of the" }, { "start": 2556.359, "text": "different inputs that the that the model" }, { "start": 2559.48, "text": "has" }, { "start": 2560.48, "text": "seen if we look at this problem of" }, { "start": 2562.88, "text": "binary" }, { "start": 2563.92, "text": "classification right between yeses and" }, { "start": 2566.68, "text": "NOS right will I pass the class or will" }, { "start": 2568.96, "text": "I not pass the class it's a zero or one" }, { "start": 2572.16, "text": "probability and we can use what is" }, { "start": 2574.079, "text": "called the softmax function or the" }, { "start": 2575.96, "text": "softmax cross entry function to be able" }, { "start": 2578.68, "text": "to inform if this network is getting the" }, { "start": 2581.76, "text": "answer correct or incorrect right the" }, { "start": 2584.079, "text": "softmax cross or the cross entropy" }, { "start": 2585.96, "text": "function think of this as a as an" }, { "start": 2587.76, "text": "objective function it's a loss function" }, { "start": 2590.0, "text": "that tells our neural network how far" }, { "start": 2592.64, "text": "away these two probability distributions" }, { "start": 2594.68, "text": "are right so the output is a probability" }, { "start": 2597.2, "text": "distribution we're trying to determine" }, { "start": 2599.079, "text": "how bad of an answer the neural network" }, { "start": 2601.96, "text": "is predicting so that we can give it" }, { "start": 2603.48, "text": "feedback to get a better" }, { "start": 2605.319, "text": "answer now let's suppose in instead of" }, { "start": 2607.52, "text": "training a or predicting a binary output" }, { "start": 2610.559, "text": "we want to predict a real valued output" }, { "start": 2613.48, "text": "like a like any number it can take any" }, { "start": 2615.28, "text": "number plus or minus infinity so for" }, { "start": 2617.76, "text": "example if you wanted to predict the uh" }, { "start": 2620.24, "text": "grade that you get in a class right" }, { "start": 2623.28, "text": "doesn't necessarily need to be between Z" }, { "start": 2625.16, "text": "and one or Z and 100 even right you" }, { "start": 2627.92, "text": "could now use a different loss in order" }, { "start": 2629.839, "text": "to produce that value because our" }, { "start": 2631.76, "text": "outputs are no longer a probability" }, { "start": 2633.96, "text": "distribution right so for example what" }, { "start": 2636.16, "text": "you might do here is compute a mean" }, { "start": 2638.119, "text": "squared error probabil or mean squared" }, { "start": 2640.119, "text": "error loss function between your true" }, { "start": 2641.839, "text": "value or your true grade of the class" }, { "start": 2644.88, "text": "and the predicted grade right these are" }, { "start": 2646.8, "text": "two numbers they're not probabilities" }, { "start": 2648.88, "text": "necessarily you compute their difference" }, { "start": 2651.24, "text": "you square it to to look at a distance" }, { "start": 2653.52, "text": "between the two an absolute distance" }, { "start": 2656.28, "text": "right sign doesn't matter and then you" }, { "start": 2658.52, "text": "can minimize this thing" }, { "start": 2661.0, "text": "right okay great so let's put all of" }, { "start": 2663.72, "text": "this loss information with this problem" }, { "start": 2665.8, "text": "of finding our Network" }, { "start": 2667.839, "text": "into a unified problem and a unified" }, { "start": 2670.44, "text": "solution to actually train our neural" }, { "start": 2674.079, "text": "network so we knowe that we want to find" }, { "start": 2677.559, "text": "a neural network that will solve this" }, { "start": 2679.559, "text": "problem on all this data on average" }, { "start": 2681.92, "text": "right that's how we contextualize this" }, { "start": 2684.0, "text": "problem earlier in the in the lectures" }, { "start": 2686.24, "text": "this means effectively that we're trying" }, { "start": 2687.76, "text": "to solve or we're trying to find what" }, { "start": 2690.839, "text": "are the weights for our neural network" }, { "start": 2693.079, "text": "what are this ve this big Vector W that" }, { "start": 2695.8, "text": "we talked about in earlier in the" }, { "start": 2697.24, "text": "lecture what is this Vector W compute" }, { "start": 2699.92, "text": "this Vector W for me based on all of the" }, { "start": 2702.599, "text": "data that we have seen right now the" }, { "start": 2705.559, "text": "vector W is also going to determine what" }, { "start": 2709.64, "text": "is the loss right so given a single" }, { "start": 2711.92, "text": "Vector w we can compute how bad is this" }, { "start": 2715.2, "text": "neural network performing on our data" }, { "start": 2718.0, "text": "right so what is the loss what is this" }, { "start": 2720.119, "text": "deviation from the ground truth of our" }, { "start": 2722.64, "text": "network uh based on where it should" }, { "start": 2725.28, "text": "be now remember that that W is just a" }, { "start": 2729.559, "text": "group of a bunch of numbers right it's a" }, { "start": 2732.559, "text": "very big list of numbers a list of" }, { "start": 2735.48, "text": "Weights uh for every single layer and" }, { "start": 2738.52, "text": "every single neuron in our neural" }, { "start": 2740.88, "text": "network right so it's just a very big" }, { "start": 2743.359, "text": "list or a vector of of Weights we want" }, { "start": 2745.839, "text": "to find that Vector what is that Vector" }, { "start": 2748.04, "text": "based on a lot of data that's the" }, { "start": 2749.599, "text": "problem of training a neural network and" }, { "start": 2751.88, "text": "remember our loss function is just a" }, { "start": 2754.24, "text": "simple function of our weights if we" }, { "start": 2757.28, "text": "have only two weights in our neural" }, { "start": 2758.92, "text": "network like we saw earlier in the slide" }, { "start": 2761.04, "text": "then we can plot the Lost landscape over" }, { "start": 2763.839, "text": "this two-dimensional space right so we" }, { "start": 2765.72, "text": "have two weights W1 and W2 and for every" }, { "start": 2768.8, "text": "single configuration or setting of those" }, { "start": 2772.04, "text": "two weights our loss will have a" }, { "start": 2774.599, "text": "particular value which here we're" }, { "start": 2775.88, "text": "showing is the height of this graph" }, { "start": 2778.16, "text": "right so for any W1 and W2 what is the" }, { "start": 2781.52, "text": "loss and what we want to do is find the" }, { "start": 2784.52, "text": "lowest point what is the best loss where" }, { "start": 2787.48, "text": "what are the weights such that our loss" }, { "start": 2790.359, "text": "will be as good as possible so the" }, { "start": 2793.04, "text": "smaller the loss the better so we want" }, { "start": 2794.48, "text": "to find the lowest point in this" }, { "start": 2797.599, "text": "graph now how do we do that right so the" }, { "start": 2800.76, "text": "way this works is we start somewhere in" }, { "start": 2803.88, "text": "this space we don't know where to start" }, { "start": 2805.24, "text": "so let's pick a random place to start" }, { "start": 2808.079, "text": "right now from that place let's compute" }, { "start": 2812.559, "text": "What's called the gradient of the" }, { "start": 2814.359, "text": "landscape at that particular point this" }, { "start": 2816.48, "text": "is a very local estimate of where is" }, { "start": 2819.88, "text": "going up basically where where is the" }, { "start": 2822.079, "text": "slope increasing at my current location" }, { "start": 2825.28, "text": "right that informs us not only where the" }, { "start": 2827.2, "text": "slope is increasing but more importantly" }, { "start": 2829.72, "text": "where the slope is decreasing if I" }, { "start": 2831.28, "text": "negate the direction if I go in the" }, { "start": 2832.68, "text": "opposite direction I can actually step" }, { "start": 2835.04, "text": "down into the landscape and change my" }, { "start": 2837.839, "text": "weights such that I lower my" }, { "start": 2840.559, "text": "loss so let's take a small step just a" }, { "start": 2843.359, "text": "small step in the opposite direction of" }, { "start": 2845.319, "text": "the part that's going up let's take a" }, { "start": 2847.559, "text": "small step going down and we'll keep" }, { "start": 2849.88, "text": "repeating this process we'll compute a" }, { "start": 2851.559, "text": "new gradient at that new point and then" }, { "start": 2853.88, "text": "we'll take another small step and we'll" }, { "start": 2855.28, "text": "keep doing this over and over and over" }, { "start": 2856.96, "text": "again until we converge at what's called" }, { "start": 2859.04, "text": "a local minimum right so based on where" }, { "start": 2861.76, "text": "we started it may not be a global" }, { "start": 2864.04, "text": "minimum of everywhere in this lost" }, { "start": 2865.8, "text": "landscape but let's find ourselves now" }, { "start": 2867.72, "text": "in a local minimum and we're guaranteed" }, { "start": 2869.599, "text": "to actually converge by following this" }, { "start": 2871.28, "text": "very simple algorithm at a local" }, { "start": 2874.359, "text": "minimum so let's summarize now this" }, { "start": 2876.44, "text": "algorithm this algorithm is called" }, { "start": 2878.2, "text": "gradient descent let's summarize it" }, { "start": 2879.8, "text": "first in pseudo code and then we'll look" }, { "start": 2881.8, "text": "at it in actual code in a second so" }, { "start": 2884.599, "text": "there's a few steps first step is we" }, { "start": 2886.64, "text": "initialize our location somewhere" }, { "start": 2889.2, "text": "randomly in this weight space right we" }, { "start": 2892.4, "text": "compute the gradient of of our loss at" }, { "start": 2897.04, "text": "with respect to our weights okay and" }, { "start": 2900.24, "text": "then we take a small step in the" }, { "start": 2901.76, "text": "opposite direction and we keep repeating" }, { "start": 2903.76, "text": "this in a loop over and over and over" }, { "start": 2905.48, "text": "again and we say we keep we keep doing" }, { "start": 2907.2, "text": "this until convergence right until we" }, { "start": 2909.359, "text": "stop moving basically and our Network" }, { "start": 2911.72, "text": "basically finds where it's supposed to" }, { "start": 2913.359, "text": "end up we'll talk about this this uh" }, { "start": 2917.0, "text": "this small step right so we're" }, { "start": 2918.599, "text": "multiplying our gradient by what I keep" }, { "start": 2920.92, "text": "calling is a small step we'll talk about" }, { "start": 2923.0, "text": "that a bit more about a bit more in" }, { "start": 2925.72, "text": "later part of this this lecture but for" }, { "start": 2928.079, "text": "now let's also very quickly show the" }, { "start": 2930.079, "text": "analogous part in in code as well and it" }, { "start": 2933.28, "text": "mirrors very nicely right so we'll" }, { "start": 2935.2, "text": "randomly initialize our weight" }, { "start": 2937.599, "text": "this happens every time you train a" }, { "start": 2938.92, "text": "neural network you have to randomly" }, { "start": 2940.28, "text": "initialize the weights and then you have" }, { "start": 2941.92, "text": "a loop right here showing it without" }, { "start": 2944.799, "text": "even convergence right we're just going" }, { "start": 2946.359, "text": "to keep looping forever where we say" }, { "start": 2949.119, "text": "okay we're going to compute the loss at" }, { "start": 2950.76, "text": "that location compute the gradient so" }, { "start": 2953.28, "text": "which way is up and then we just negate" }, { "start": 2956.359, "text": "that gradient multiply it by some what's" }, { "start": 2958.48, "text": "called learning rate LR denoted here" }, { "start": 2960.839, "text": "it's a small step and then we take a" }, { "start": 2963.119, "text": "direction in that small" }, { "start": 2965.319, "text": "step so let's take a deeper look at this" }, { "start": 2968.119, "text": "term here this is called the gradient" }, { "start": 2969.92, "text": "right this tells us which way is up in" }, { "start": 2971.92, "text": "that landscape and this again it tells" }, { "start": 2974.839, "text": "us even more than that it tells us how" }, { "start": 2976.64, "text": "is our landscape how is our loss" }, { "start": 2979.319, "text": "changing as a function of all of our" }, { "start": 2981.799, "text": "weights but I actually have not told you" }, { "start": 2984.44, "text": "how to compute this so let's talk about" }, { "start": 2986.559, "text": "that process that process is called back" }, { "start": 2988.68, "text": "propagation we'll go through this very" }, { "start": 2990.72, "text": "very briefly and we'll start with the" }, { "start": 2993.24, "text": "simplest neural network uh that's" }, { "start": 2995.68, "text": "possible right so we already saw the" }, { "start": 2997.68, "text": "simplest building block which is a" }, { "start": 2999.24, "text": "single neuron now let's build the" }, { "start": 3000.599, "text": "simplest neural network which is just a" }, { "start": 3002.88, "text": "one neuron neural network right so it" }, { "start": 3005.24, "text": "has one hidden neuron it goes from input" }, { "start": 3007.2, "text": "to Hidden neuron to output and we want" }, { "start": 3009.839, "text": "to compute the gradient of our loss with" }, { "start": 3012.24, "text": "respect to this weight W2 okay so I'm" }, { "start": 3015.92, "text": "highlighting it here so we have two" }, { "start": 3017.68, "text": "weights let's compute the gradient first" }, { "start": 3020.48, "text": "with respect to W2 and that tells us how" }, { "start": 3023.72, "text": "much does a small change in w 2 affect" }, { "start": 3027.68, "text": "our loss does our loss go up or down if" }, { "start": 3029.88, "text": "we move our W2 a little bit in One" }, { "start": 3032.2, "text": "Direction or another so let's write out" }, { "start": 3035.0, "text": "this derivative we can start by applying" }, { "start": 3037.0, "text": "the chain rule backwards from the loss" }, { "start": 3039.68, "text": "through the" }, { "start": 3040.559, "text": "output and specifically we can actually" }, { "start": 3043.64, "text": "decompose this law this uh derivative" }, { "start": 3047.0, "text": "this gradient into two parts right so" }, { "start": 3049.16, "text": "the first part we're decomposing it from" }, { "start": 3051.52, "text": "DJ" }, { "start": 3052.68, "text": "dw2 into DJ Dy right which is our output" }, { "start": 3058.839, "text": "multiplied by Dy dw2 right this is all" }, { "start": 3062.319, "text": "possible right it's a chain rule it's a" }, { "start": 3064.839, "text": "I'm just reciting a chain rule here from" }, { "start": 3067.92, "text": "calculus this is possible because Y is" }, { "start": 3070.359, "text": "only dependent on the previous layer and" }, { "start": 3073.24, "text": "now let's suppose we don't want to do" }, { "start": 3074.48, "text": "this for W2 but we want to do it for W1" }, { "start": 3076.96, "text": "we can use the exact same process right" }, { "start": 3078.64, "text": "but now it's one step further right" }, { "start": 3080.76, "text": "we'll now replace W2 with W1 we need to" }, { "start": 3083.4, "text": "apply the chain rule yet again once" }, { "start": 3085.52, "text": "again to decompose the problem further" }, { "start": 3087.2, "text": "and now we propagate our old gradient" }, { "start": 3089.0, "text": "that we computed for W2 all the way back" }, { "start": 3092.28, "text": "one more step uh to the weight that" }, { "start": 3094.48, "text": "we're interested in which in this case" }, { "start": 3095.92, "text": "is" }, { "start": 3097.0, "text": "W1 and we keep repeating this process" }, { "start": 3099.68, "text": "over and over again propagating these" }, { "start": 3101.4, "text": "gradients backwards from output to input" }, { "start": 3104.4, "text": "to compute ultimately what we want in" }, { "start": 3106.799, "text": "the end is this derivative of every" }, { "start": 3109.64, "text": "weight so the the derivative of our loss" }, { "start": 3112.48, "text": "with respect to every weight in our" }, { "start": 3114.04, "text": "neural network this tells us how much" }, { "start": 3115.799, "text": "does a small change in every single" }, { "start": 3117.559, "text": "weight in our Network affect the loss" }, { "start": 3119.44, "text": "does our loss go up or down if we change" }, { "start": 3121.24, "text": "this weight a little bit in this" }, { "start": 3122.799, "text": "direction or a little bit in that" }, { "start": 3124.079, "text": "direction yes I think you use the term" }, { "start": 3127.16, "text": "neuron is perceptron is there a" }, { "start": 3129.2, "text": "functional difference neuron and" }, { "start": 3130.76, "text": "perceptron are the same so typically" }, { "start": 3132.64, "text": "people say neural network which is why" }, { "start": 3134.52, "text": "like a single neuron it's also gotten" }, { "start": 3136.559, "text": "popularity but originally a perceptron" }, { "start": 3139.2, "text": "is is the the formal term the two terms" }, { "start": 3141.88, "text": "are" }, { "start": 3144.48, "text": "identical Okay so now we've covered a" }, { "start": 3148.0, "text": "lot so we've covered the forward" }, { "start": 3149.28, "text": "propagation of information through a" }, { "start": 3150.839, "text": "neuron and through a neural network all" }, { "start": 3153.2, "text": "the way through and we've covered now" }, { "start": 3155.04, "text": "the back propagation of information to" }, { "start": 3157.839, "text": "understand how we should uh change every" }, { "start": 3160.16, "text": "single one of those weights in our" }, { "start": 3161.44, "text": "neural network to improve our" }, { "start": 3164.319, "text": "loss so that was the back propop" }, { "start": 3166.839, "text": "algorithm in theory it's actually pretty" }, { "start": 3169.559, "text": "simple it's just a chain rule right" }, { "start": 3171.64, "text": "there's nothing there's actually nothing" }, { "start": 3172.92, "text": "more than than just the chain Rule and" }, { "start": 3175.799, "text": "the nice part that deep learning" }, { "start": 3177.2, "text": "libraries actually do this for you so" }, { "start": 3178.92, "text": "they compute back prop for you you don't" }, { "start": 3180.599, "text": "actually have to implement it yourself" }, { "start": 3181.96, "text": "which is very convenient but now it's" }, { "start": 3184.04, "text": "important to touch on even though the" }, { "start": 3186.24, "text": "theory is actually not that complicated" }, { "start": 3188.119, "text": "for back propagation let's touch on it" }, { "start": 3190.28, "text": "now from practice now thinking a little" }, { "start": 3192.559, "text": "bit towards your own implementations" }, { "start": 3194.2, "text": "when you want to implement these neural" }, { "start": 3196.079, "text": "networks what are some insights so" }, { "start": 3198.92, "text": "optimization of neural networks in" }, { "start": 3200.76, "text": "practice is a completely different story" }, { "start": 3202.839, "text": "it's not straightforward at all and in" }, { "start": 3205.64, "text": "practice it's very difficult and usually" }, { "start": 3207.799, "text": "very computationally intensive to do" }, { "start": 3209.799, "text": "this backrop algorithm so here's an" }, { "start": 3212.079, "text": "illustration from a paper that came out" }, { "start": 3214.079, "text": "a few years ago that actually attempted" }, { "start": 3216.52, "text": "to visualize a very deep neural" }, { "start": 3218.599, "text": "Network's lost landscape so previously" }, { "start": 3220.599, "text": "we had that other uh depiction" }, { "start": 3222.96, "text": "visualization of how a neural network" }, { "start": 3225.0, "text": "would look in a two-dimensional" }, { "start": 3226.0, "text": "landscape real neural networks are not" }, { "start": 3228.04, "text": "two-dimensional" }, { "start": 3229.68, "text": "they're hundreds or millions or billions" }, { "start": 3232.2, "text": "of dimensions and now what would those" }, { "start": 3235.799, "text": "lost landscap apes look like you can" }, { "start": 3237.599, "text": "actually try some clever techniques to" }, { "start": 3239.64, "text": "actually visualize them this is one" }, { "start": 3240.88, "text": "paper that attempted to do that and it" }, { "start": 3243.28, "text": "turns out that they look extremely messy" }, { "start": 3246.68, "text": "right um the important thing is that if" }, { "start": 3249.799, "text": "you do this algorithm and you start in a" }, { "start": 3251.88, "text": "bad place depending on your neural" }, { "start": 3253.64, "text": "network you may not actually end up in" }, { "start": 3255.92, "text": "the the global solution right so your" }, { "start": 3258.0, "text": "initialization matters a lot and you" }, { "start": 3260.04, "text": "need to kind of Traverse these local" }, { "start": 3261.839, "text": "Minima and try to try and help you find" }, { "start": 3264.24, "text": "the global Minima or even more than that" }, { "start": 3266.799, "text": "you need to construct neural networks" }, { "start": 3269.48, "text": "that have lost Landscapes that are much" }, { "start": 3271.88, "text": "more amenable to optimization than this" }, { "start": 3274.04, "text": "one right so this is a very bad lost" }, { "start": 3275.599, "text": "landscape there are some techniques that" }, { "start": 3277.64, "text": "we can apply to our neural networks that" }, { "start": 3279.92, "text": "smooth out their lost landscape and make" }, { "start": 3281.68, "text": "them easier to" }, { "start": 3283.04, "text": "optimize so recall that update equation" }, { "start": 3286.04, "text": "that we talked about earlier with" }, { "start": 3287.92, "text": "gradient descent right so there is this" }, { "start": 3289.76, "text": "parameter here that we didn't talk about" }, { "start": 3292.24, "text": "we we described this as the little step" }, { "start": 3294.2, "text": "that you could take right so it's a" }, { "start": 3295.359, "text": "small number that multiply with the" }, { "start": 3297.76, "text": "direction which is your gradient it just" }, { "start": 3299.72, "text": "tells you okay I'm not going to just go" }, { "start": 3301.44, "text": "all the way in this direction I'll just" }, { "start": 3302.839, "text": "take a small step in this direction so" }, { "start": 3305.359, "text": "in practice even setting this value" }, { "start": 3307.88, "text": "right it's just one number setting this" }, { "start": 3309.68, "text": "one number can be rather difficult right" }, { "start": 3312.839, "text": "if we set the learning rate too um small" }, { "start": 3316.68, "text": "then the model can get stuck in these" }, { "start": 3319.04, "text": "local Minima right so here it starts and" }, { "start": 3321.359, "text": "it kind of gets stuck in this local" }, { "start": 3322.839, "text": "Minima it converges very slowly even if" }, { "start": 3325.2, "text": "it doesn't get stuck if the learning" }, { "start": 3327.24, "text": "rate is too large it can kind of" }, { "start": 3328.96, "text": "overshoot and in practice it even" }, { "start": 3331.079, "text": "diverges and explodes and you don't" }, { "start": 3333.839, "text": "actually ever find any" }, { "start": 3335.839, "text": "Minima now ideally what we want is to" }, { "start": 3338.599, "text": "use learning rates that are not too" }, { "start": 3340.4, "text": "small and not too large to so they're" }, { "start": 3343.4, "text": "large enough to basically avoid those" }, { "start": 3345.039, "text": "local Minima but small enough such that" }, { "start": 3347.88, "text": "they won't diverge and they will" }, { "start": 3349.28, "text": "actually still find their way into the" }, { "start": 3352.039, "text": "global Minima so something like this is" }, { "start": 3354.24, "text": "what you should intuitively have in mind" }, { "start": 3356.079, "text": "right so something that can overshoot" }, { "start": 3357.44, "text": "the local minimas but find itself into a" }, { "start": 3359.96, "text": "a better Minima and then finally" }, { "start": 3362.119, "text": "stabilize itself there so how do we" }, { "start": 3364.44, "text": "actually set these learning rates right" }, { "start": 3366.44, "text": "in practice what does that process look" }, { "start": 3368.16, "text": "like now idea number one is is very" }, { "start": 3371.44, "text": "basic right it's try a bunch of" }, { "start": 3372.839, "text": "different learning rates and see what" }, { "start": 3374.16, "text": "works and that's actually a not a bad" }, { "start": 3377.28, "text": "process in practice it's one of the" }, { "start": 3378.799, "text": "processes that people use um so that" }, { "start": 3382.28, "text": "that's uh that's interesting but let's" }, { "start": 3383.96, "text": "see if we can do something smarter than" }, { "start": 3385.48, "text": "this and let's see how can design" }, { "start": 3387.64, "text": "algorithms that uh can adapt to the" }, { "start": 3390.52, "text": "Landscapes right so in practice there's" }, { "start": 3392.64, "text": "no reason why this should be a single" }, { "start": 3394.119, "text": "number right can we have learning rates" }, { "start": 3397.119, "text": "that adapt to the model to the data to" }, { "start": 3400.2, "text": "the Landscapes to the gradients that" }, { "start": 3401.799, "text": "it's seeing around so this means that" }, { "start": 3404.039, "text": "the learning rate may actually increase" }, { "start": 3406.2, "text": "or decrease as a function of the" }, { "start": 3409.0, "text": "gradients in the loss function right how" }, { "start": 3411.72, "text": "fast we're learning or many other" }, { "start": 3413.799, "text": "options right there are many different" }, { "start": 3415.76, "text": "ideas that could be done here and in" }, { "start": 3417.359, "text": "fact there are many widely used" }, { "start": 3420.44, "text": "different procedures or methodologies" }, { "start": 3423.28, "text": "for setting the learning rate and during" }, { "start": 3425.88, "text": "your Labs we actually encourage you to" }, { "start": 3427.799, "text": "try out some of these different ideas" }, { "start": 3429.96, "text": "for different types of learning rates" }, { "start": 3431.44, "text": "and and even play around with you know" }, { "start": 3433.48, "text": "what what's the effect of increasing or" }, { "start": 3435.119, "text": "decreasing your learning rate you'll see" }, { "start": 3436.599, "text": "very striking" }, { "start": 3439.559, "text": "differences do it because it's on a" }, { "start": 3441.44, "text": "close interval why not just find the" }, { "start": 3443.799, "text": "absolute minimum you know test" }, { "start": 3447.96, "text": "right so so a few things what number one" }, { "start": 3450.559, "text": "is that it's not a closed space right so" }, { "start": 3452.76, "text": "there's an infinite every every weight" }, { "start": 3454.68, "text": "can be plus or minus up to Infinity" }, { "start": 3457.28, "text": "right so even if it was a" }, { "start": 3459.319, "text": "one-dimensional neural network with just" }, { "start": 3461.24, "text": "one weight it's not a closed" }, { "start": 3463.559, "text": "space in practice it's even worse than" }, { "start": 3466.079, "text": "that because you have billions of" }, { "start": 3468.839, "text": "Dimensions right so not only is your uh" }, { "start": 3472.119, "text": "space your support system in one" }, { "start": 3474.4, "text": "dimension is it infinite but you now" }, { "start": 3476.92, "text": "have billions of infinite Dimensions" }, { "start": 3478.76, "text": "right or billions of uh infinite support" }, { "start": 3480.88, "text": "spaces so it's not something that you" }, { "start": 3482.799, "text": "can just like search every weight every" }, { "start": 3484.92, "text": "possible weight in your neural in your" }, { "start": 3487.68, "text": "configuration or what is every possible" }, { "start": 3489.4, "text": "weight that this neural network could" }, { "start": 3490.64, "text": "take and let me test them out because it" }, { "start": 3493.799, "text": "it's not practical to do even for a very" }, { "start": 3495.52, "text": "small neural network in" }, { "start": 3498.96, "text": "practice so in your Labs you can really" }, { "start": 3501.64, "text": "try to put all of this information uh in" }, { "start": 3504.16, "text": "this picture into practice which defines" }, { "start": 3506.96, "text": "your model number one right here defines" }, { "start": 3510.599, "text": "your Optimizer which previously we" }, { "start": 3513.48, "text": "denoted as this gradient descent" }, { "start": 3515.16, "text": "Optimizer here we're calling it uh" }, { "start": 3517.24, "text": "stochastic gradient descent or SGD we'll" }, { "start": 3519.64, "text": "talk about that more in a second and" }, { "start": 3521.799, "text": "then also note that your Optimizer which" }, { "start": 3524.839, "text": "here we're calling SGD could be any of" }, { "start": 3527.52, "text": "these adaptive optimizers you can swap" }, { "start": 3529.28, "text": "them out and you should swap them out" }, { "start": 3530.64, "text": "you should test different things here to" }, { "start": 3532.119, "text": "see the impact of these different" }, { "start": 3534.44, "text": "methods on your training procedure and" }, { "start": 3536.96, "text": "you'll gain very valuable intuition for" }, { "start": 3539.96, "text": "the different insights that will come" }, { "start": 3541.319, "text": "with that as well so I want to continue" }, { "start": 3543.64, "text": "very briefly just for the end of this" }, { "start": 3545.16, "text": "lecture to talk about tips for training" }, { "start": 3547.88, "text": "neural networks in practice and how we" }, { "start": 3549.92, "text": "can focus on this powerful idea of" }, { "start": 3553.359, "text": "really what's called batching data right" }, { "start": 3555.96, "text": "not seeing all of your data but now" }, { "start": 3558.44, "text": "talking about a topic called" }, { "start": 3560.359, "text": "batching so to do this let's very" }, { "start": 3562.599, "text": "briefly revisit this gradient descent" }, { "start": 3564.319, "text": "algorithm the gradient is compute this" }, { "start": 3567.16, "text": "gradient computation the backrop" }, { "start": 3569.039, "text": "algorithm I mentioned this earlier it's" }, { "start": 3570.839, "text": "a very computationally expensive uh" }, { "start": 3573.72, "text": "operation and it's even worse because we" }, { "start": 3576.24, "text": "now are we previously described it in a" }, { "start": 3578.44, "text": "way where we would have to compute it" }, { "start": 3580.0, "text": "over a summation over every single data" }, { "start": 3582.64, "text": "point in our entire data set right" }, { "start": 3584.92, "text": "that's how we defined it with the loss" }, { "start": 3586.24, "text": "function it's an average over all of our" }, { "start": 3588.079, "text": "data points which means that we're" }, { "start": 3589.48, "text": "summing over all of our data points the" }, { "start": 3591.44, "text": "gradients so in most real life problems" }, { "start": 3594.359, "text": "this would be completely infeasible to" }, { "start": 3596.119, "text": "do because our data sets are simply too" }, { "start": 3597.72, "text": "big and the models are too big to to" }, { "start": 3600.079, "text": "compute those gradients on every single" }, { "start": 3601.72, "text": "iteration remember this isn't just a" }, { "start": 3603.2, "text": "onetime thing right it's every single" }, { "start": 3605.319, "text": "step that you do you keep taking small" }, { "start": 3607.079, "text": "steps so you keep need you keep needing" }, { "start": 3609.16, "text": "to repeat this process so instead let's" }, { "start": 3611.68, "text": "define a new gradient descent algorithm" }, { "start": 3613.68, "text": "called SGD stochastic gradient descent" }, { "start": 3616.76, "text": "instead of computing the gradient over" }, { "start": 3618.48, "text": "the entire data set now let's just pick" }, { "start": 3621.68, "text": "a single training point and compute that" }, { "start": 3624.4, "text": "one training Point gradient" }, { "start": 3626.48, "text": "right the nice thing about that is that" }, { "start": 3628.839, "text": "it's much easier to compute that" }, { "start": 3630.72, "text": "gradient right it only needs one point" }, { "start": 3633.16, "text": "and the downside is that it's very noisy" }, { "start": 3636.28, "text": "it's very stochastic since it was" }, { "start": 3638.359, "text": "computed using just that one examples" }, { "start": 3640.2, "text": "right so you have that that tradeoff" }, { "start": 3641.96, "text": "that" }, { "start": 3642.72, "text": "exists so what's the middle ground right" }, { "start": 3645.24, "text": "the middle ground is to take not one" }, { "start": 3647.079, "text": "data point and not the full data set but" }, { "start": 3650.359, "text": "a batch of data right so take a what's" }, { "start": 3652.079, "text": "called a mini batch right this could be" }, { "start": 3653.799, "text": "something in practice like 32 pieces of" }, { "start": 3656.24, "text": "data is a common batch size and this" }, { "start": 3658.92, "text": "gives us an estimate of the true" }, { "start": 3660.839, "text": "gradient right so you approximate the" }, { "start": 3662.52, "text": "gradient by averaging the gradient of" }, { "start": 3664.599, "text": "these 32 samples it's still fast because" }, { "start": 3668.0, "text": "32 is much smaller than the size of your" }, { "start": 3670.24, "text": "entire data set but it's pretty quick" }, { "start": 3672.96, "text": "now right it's still noisy but it's okay" }, { "start": 3675.039, "text": "usually in practice because you can" }, { "start": 3676.359, "text": "still iterate much" }, { "start": 3678.4, "text": "faster and since B is normally not that" }, { "start": 3681.0, "text": "large again think of something like in" }, { "start": 3682.96, "text": "the tens or the hundreds of samples it's" }, { "start": 3686.0, "text": "very fast to compute this in practice" }, { "start": 3688.039, "text": "compared to regular gradient descent and" }, { "start": 3690.319, "text": "it's also much more accurate compared to" }, { "start": 3692.4, "text": "stochastic gradient descent and the" }, { "start": 3694.559, "text": "increase in accuracy of this gradient" }, { "start": 3697.0, "text": "estimation allows us to converge to our" }, { "start": 3699.52, "text": "solution significantly faster as well" }, { "start": 3702.44, "text": "right it's not only about the speed it's" }, { "start": 3704.359, "text": "just about the increase in accuracy of" }, { "start": 3706.2, "text": "those gradients allows us to get to our" }, { "start": 3708.4, "text": "solution much" }, { "start": 3709.92, "text": "faster which ultimately means that we" }, { "start": 3712.0, "text": "can train much faster as well and we can" }, { "start": 3714.039, "text": "save compute and the other really nice" }, { "start": 3716.88, "text": "thing about mini batches is that they" }, { "start": 3719.559, "text": "allow for parallelizing our computation" }, { "start": 3723.24, "text": "right and that was a concept that we had" }, { "start": 3724.64, "text": "talked about earlier in the class as" }, { "start": 3726.0, "text": "well and here's where it's coming in we" }, { "start": 3727.92, "text": "can split up those batches right so" }, { "start": 3730.079, "text": "those 32 pieces of data let's say if our" }, { "start": 3732.2, "text": "batch size is 32 we can split them up" }, { "start": 3734.68, "text": "onto different workers right different" }, { "start": 3737.079, "text": "parts of the GPU can tackle those" }, { "start": 3739.359, "text": "different parts of our data points this" }, { "start": 3742.839, "text": "can allow us to basically achieve even" }, { "start": 3744.599, "text": "more significant speed up using GPU" }, { "start": 3747.279, "text": "architectures and GPU Hardware okay" }, { "start": 3750.16, "text": "finally last topic I want to talk about" }, { "start": 3752.319, "text": "before we end this lecture and move on" }, { "start": 3754.16, "text": "to lecture number two is overfitting" }, { "start": 3757.079, "text": "right so overfitting is this idea that" }, { "start": 3759.559, "text": "is actually not a deep learning Centric" }, { "start": 3761.559, "text": "problem at all it's it's a problem that" }, { "start": 3763.0, "text": "exists in all of machine learning right" }, { "start": 3765.52, "text": "the key problem is that and the key" }, { "start": 3769.0, "text": "problem is actually one" }, { "start": 3771.44, "text": "that addresses how you can accurately" }, { "start": 3774.64, "text": "Define if if your model is is actually" }, { "start": 3778.319, "text": "capturing your true data set right or if" }, { "start": 3781.52, "text": "it's just learning kind of the subtle" }, { "start": 3783.44, "text": "details that are kind of sply" }, { "start": 3786.279, "text": "correlating to your data set so said" }, { "start": 3789.119, "text": "differently let me say it a bit" }, { "start": 3790.52, "text": "differently now so let's say we want to" }, { "start": 3793.4, "text": "build models that can learn" }, { "start": 3796.4, "text": "representations okay from our training" }, { "start": 3798.48, "text": "data that still generalize to brand new" }, { "start": 3801.72, "text": "unseen test points right that's the real" }, { "start": 3804.2, "text": "goal here is we want to teach our model" }, { "start": 3806.119, "text": "something based on a lot of training" }, { "start": 3807.4, "text": "data but then we don't want it to do" }, { "start": 3809.079, "text": "well in the training data we want it to" }, { "start": 3810.4, "text": "do well when we deploy it into the real" }, { "start": 3812.68, "text": "world and it's seeing things that it has" }, { "start": 3814.2, "text": "never seen during training so the" }, { "start": 3816.64, "text": "concept of overfitting is exactly" }, { "start": 3819.319, "text": "addressing that problem overfitting" }, { "start": 3821.48, "text": "means if if your model is doing very" }, { "start": 3825.319, "text": "well on your training data but very" }, { "start": 3827.0, "text": "badly in testing it pro it's that means" }, { "start": 3830.279, "text": "it's overfitting it's overfitting to the" }, { "start": 3832.96, "text": "training data that it saw on the other" }, { "start": 3834.64, "text": "hand there's also underfitting" }, { "start": 3836.319, "text": "right on the left hand side you can see" }, { "start": 3838.44, "text": "basically not fitting the data enough" }, { "start": 3841.48, "text": "which means that you know you're going" }, { "start": 3842.88, "text": "to achieve very similar performance on" }, { "start": 3844.48, "text": "your testing distribution but both are" }, { "start": 3846.799, "text": "underperforming the actual capabilities" }, { "start": 3849.279, "text": "of your system now ideally you want to" }, { "start": 3851.68, "text": "end up somewhere in the middle which is" }, { "start": 3853.88, "text": "not too complex where you're memorizing" }, { "start": 3856.039, "text": "all of the nuances in your training data" }, { "start": 3858.2, "text": "like on the right but you still want to" }, { "start": 3860.48, "text": "continue to perform well even based on" }, { "start": 3863.48, "text": "the brand new data so you're not" }, { "start": 3864.599, "text": "underfitting as well" }, { "start": 3866.599, "text": "so to talk to actually address this" }, { "start": 3868.64, "text": "problem in neural networks and in" }, { "start": 3870.2, "text": "machine learning in general there's a" }, { "start": 3871.44, "text": "few different ways that you should be" }, { "start": 3873.119, "text": "aware of and how to do it because you'll" }, { "start": 3874.96, "text": "need to apply them as part of your" }, { "start": 3877.279, "text": "Solutions and your software Labs as well" }, { "start": 3879.72, "text": "so the key concept here is called" }, { "start": 3881.559, "text": "regularization right regularization is a" }, { "start": 3883.88, "text": "technique that you can introduce and" }, { "start": 3886.559, "text": "said very simply all regularization is" }, { "start": 3889.2, "text": "is a way to discourage your model" }, { "start": 3893.119, "text": "from from these nuances in your training" }, { "start": 3897.0, "text": "data from being learned that's all it is" }, { "start": 3899.839, "text": "and as we've seen before it's actually" }, { "start": 3901.319, "text": "critical for our models to be able to" }, { "start": 3903.119, "text": "generalize you know not just on training" }, { "start": 3905.319, "text": "data but really what we care about is" }, { "start": 3907.16, "text": "the testing data so the most popular" }, { "start": 3909.92, "text": "regularization technique that's" }, { "start": 3911.599, "text": "important for you to understand is this" }, { "start": 3913.799, "text": "very simple idea called Dropout let's" }, { "start": 3916.92, "text": "revisit this picture of a deep neural" }, { "start": 3918.559, "text": "network that we've been seeing all" }, { "start": 3920.0, "text": "lecture right in Dropout our training" }, { "start": 3922.799, "text": "during training what we're going to do" }, { "start": 3924.88, "text": "is randomly set some of the activations" }, { "start": 3927.839, "text": "right these outputs of every single" }, { "start": 3929.799, "text": "neuron to zero we're just randomly going" }, { "start": 3932.559, "text": "to set them to zero with some" }, { "start": 3934.2, "text": "probability right so let's say 50% is" }, { "start": 3937.72, "text": "our probability that means that we're" }, { "start": 3940.0, "text": "going to take all of the activation in" }, { "start": 3942.64, "text": "our in our neural network and with a" }, { "start": 3944.92, "text": "probability of 50% before we pass that" }, { "start": 3947.359, "text": "activation onto the next neuron we're" }, { "start": 3949.4, "text": "just going to set it to zero and not" }, { "start": 3951.88, "text": "pass on anything so effectively 50% of" }, { "start": 3954.76, "text": "the neurons are are going to be kind of" }, { "start": 3957.359, "text": "shut down or killed in a forward pass" }, { "start": 3959.96, "text": "and you're only going to forward pass" }, { "start": 3961.64, "text": "information with the other 50% of your" }, { "start": 3964.079, "text": "neurons so this idea is extremely" }, { "start": 3966.64, "text": "powerful actually because it lowers the" }, { "start": 3968.599, "text": "capacity of our neural network it not" }, { "start": 3970.64, "text": "only lowers the capacity of our neural" }, { "start": 3972.359, "text": "network but it's dynamically lowering it" }, { "start": 3974.599, "text": "because on the next iteration we're" }, { "start": 3976.52, "text": "going to pick a different 50% of neurons" }, { "start": 3978.72, "text": "that we drop out so constantly the" }, { "start": 3980.68, "text": "network is going to have to learn to" }, { "start": 3982.68, "text": "build Pathways different pathways from" }, { "start": 3985.799, "text": "input to output and that it can't rely" }, { "start": 3988.16, "text": "on any small any small part of the" }, { "start": 3990.319, "text": "features that are present in any part of" }, { "start": 3992.52, "text": "the training data set too extensively" }, { "start": 3994.72, "text": "right because it's constantly being" }, { "start": 3995.96, "text": "forced to find these different Pathways" }, { "start": 3998.52, "text": "with random" }, { "start": 4000.359, "text": "probabilities so that's Dropout the" }, { "start": 4002.599, "text": "second regularization technique is going" }, { "start": 4004.76, "text": "to be this notion called early stopping" }, { "start": 4006.72, "text": "which is actually something that is" }, { "start": 4008.96, "text": "model agnostic you can apply this to any" }, { "start": 4011.039, "text": "type of model as long as you have a" }, { "start": 4012.44, "text": "testing set that you can play around" }, { "start": 4013.96, "text": "with so the idea here" }, { "start": 4016.039, "text": "is that we have already a pretty formal" }, { "start": 4019.0, "text": "mathematical definition of what it means" }, { "start": 4021.359, "text": "to overfit right overfitting is just" }, { "start": 4023.88, "text": "when our model starts to perform worse" }, { "start": 4026.0, "text": "on our test set that's really all it is" }, { "start": 4028.559, "text": "right so what if we plot over the course" }, { "start": 4031.44, "text": "of training so x-axis is as we're" }, { "start": 4033.16, "text": "training the model let's look at the" }, { "start": 4035.16, "text": "performance on both the training set and" }, { "start": 4037.24, "text": "the test set so in the beginning you can" }, { "start": 4040.039, "text": "see that the training set and the test" }, { "start": 4041.92, "text": "set are both going down and they" }, { "start": 4043.839, "text": "continue to go down uh which is" }, { "start": 4046.079, "text": "excellent because it means that our" }, { "start": 4047.16, "text": "model is getting stronger eventually" }, { "start": 4049.119, "text": "though what you'll notice is that the" }, { "start": 4050.92, "text": "test loss plateaus and starts to" }, { "start": 4054.72, "text": "increase on the other hand the training" }, { "start": 4057.0, "text": "loss there's no reason why the training" }, { "start": 4058.839, "text": "loss should ever need to stop going down" }, { "start": 4061.279, "text": "right training losses generally always" }, { "start": 4063.2, "text": "continue to Decay as long as there is" }, { "start": 4066.599, "text": "capacity in the neural network to learn" }, { "start": 4069.2, "text": "those differences right but the" }, { "start": 4070.72, "text": "important point is that this continues" }, { "start": 4073.24, "text": "for the rest of training and we want to" }, { "start": 4075.2, "text": "BAS basically we care about this point" }, { "start": 4077.64, "text": "right here right this is the really" }, { "start": 4079.119, "text": "important point because this is where we" }, { "start": 4081.76, "text": "need to stop training right after this" }, { "start": 4083.76, "text": "point this is the happy medium because" }, { "start": 4085.72, "text": "after this point we start to overfit on" }, { "start": 4089.319, "text": "parts of the data where our training" }, { "start": 4091.039, "text": "accuracy becomes actually better than" }, { "start": 4093.2, "text": "our testing accuracy so our testing" }, { "start": 4094.64, "text": "accuracy is going bad it's getting worse" }, { "start": 4097.319, "text": "but our training accuracy is still" }, { "start": 4098.719, "text": "improving so it means overfitting on the" }, { "start": 4100.88, "text": "other hand on the left hand" }, { "start": 4102.839, "text": "side this is the opposite problem right" }, { "start": 4105.64, "text": "we have not fully utilized the capacity" }, { "start": 4107.719, "text": "of our model and the testing accuracy" }, { "start": 4109.839, "text": "can still improve further right this is" }, { "start": 4112.48, "text": "a very powerful idea but it's actually" }, { "start": 4114.52, "text": "extremely easy to implement in practice" }, { "start": 4116.6, "text": "because all you really have to do is" }, { "start": 4118.279, "text": "just monitor the loss of over the course" }, { "start": 4120.759, "text": "of training right and you just have to" }, { "start": 4122.199, "text": "pick the model where the testing" }, { "start": 4123.96, "text": "accuracy starts to get" }, { "start": 4126.64, "text": "worse so I'll conclude this lecture by" }, { "start": 4128.92, "text": "just summarizing three key points that" }, { "start": 4130.92, "text": "we've cover covered in the class so far" }, { "start": 4133.319, "text": "and this is a very g-pack class so the" }, { "start": 4136.08, "text": "entire week is going to be like this and" }, { "start": 4138.08, "text": "today is just the start so so far we've" }, { "start": 4140.359, "text": "learned the fundamental building blocks" }, { "start": 4142.44, "text": "of neural network starting all the way" }, { "start": 4144.239, "text": "from just one neuron also called a" }, { "start": 4145.92, "text": "perceptron we learned that we can stack" }, { "start": 4148.48, "text": "these systems on top of each other to" }, { "start": 4151.0, "text": "create a hierarchical network and how we" }, { "start": 4154.08, "text": "can mathematically optimize those types" }, { "start": 4156.279, "text": "of systems and then finally in the very" }, { "start": 4158.04, "text": "very last part of the class we talked" }, { "start": 4159.6, "text": "about just techniques tips and" }, { "start": 4161.719, "text": "techniques for actually training and" }, { "start": 4163.52, "text": "applying these systems into practice ice" }, { "start": 4166.359, "text": "now in the next lecture we're going to" }, { "start": 4167.88, "text": "hear from Ava on deep sequence modeling" }, { "start": 4170.759, "text": "using rnns and also a really new and" }, { "start": 4174.52, "text": "exciting algorithm and type of model" }, { "start": 4176.88, "text": "called the Transformer which uh is built" }, { "start": 4180.279, "text": "off of this principle of attention" }, { "start": 4182.239, "text": "you're going to learn about it in the" }, { "start": 4183.4, "text": "next class but let's for now just take a" }, { "start": 4185.679, "text": "brief pause and let's resume in about" }, { "start": 4187.64, "text": "five minutes just so we can switch" }, { "start": 4188.96, "text": "speakers and Ava can start her" }, { "start": 4191.199, "text": "presentation okay thank you" } ]