Spaces:
Running
Running
[ | |
{ | |
"start": 1.17, | |
"text": "[Music]" | |
}, | |
{ | |
"start": 10.28, | |
"text": "good afternoon everyone and welcome to" | |
}, | |
{ | |
"start": 12.88, | |
"text": "MIT sus1 191 my name is Alexander amini" | |
}, | |
{ | |
"start": 16.84, | |
"text": "and I'll be one of your instructors for" | |
}, | |
{ | |
"start": 18.32, | |
"text": "the course this year along with Ava and" | |
}, | |
{ | |
"start": 21.56, | |
"text": "together we're really excited to welcome" | |
}, | |
{ | |
"start": 23.359, | |
"text": "you to this really incredible course" | |
}, | |
{ | |
"start": 25.16, | |
"text": "this is a very fast-paced and very uh" | |
}, | |
{ | |
"start": 29.24, | |
"text": "intense one week that we're about to go" | |
}, | |
{ | |
"start": 32.079, | |
"text": "through together right so we're going to" | |
}, | |
{ | |
"start": 33.559, | |
"text": "cover the foundations of a also very" | |
}, | |
{ | |
"start": 36.52, | |
"text": "fast-paced moving field and a field that" | |
}, | |
{ | |
"start": 39.239, | |
"text": "has been rapidly changing over the past" | |
}, | |
{ | |
"start": 41.96, | |
"text": "eight years that we have taught this" | |
}, | |
{ | |
"start": 43.719, | |
"text": "course at MIT now over the past decade" | |
}, | |
{ | |
"start": 48.36, | |
"text": "in fact even before we started teaching" | |
}, | |
{ | |
"start": 50.48, | |
"text": "this course Ai and deep learning has" | |
}, | |
{ | |
"start": 52.8, | |
"text": "really been revolutionizing so many" | |
}, | |
{ | |
"start": 55.6, | |
"text": "different advances and so many different" | |
}, | |
{ | |
"start": 58.359, | |
"text": "areas of science meth mathematics" | |
}, | |
{ | |
"start": 60.519, | |
"text": "physics and and so on and not that long" | |
}, | |
{ | |
"start": 63.879, | |
"text": "ago we were having new types of we were" | |
}, | |
{ | |
"start": 67.159, | |
"text": "having challenges and problems that we" | |
}, | |
{ | |
"start": 70.36, | |
"text": "did not think were necessarily solvable" | |
}, | |
{ | |
"start": 72.92, | |
"text": "in our lifetimes that AI is now actually" | |
}, | |
{ | |
"start": 75.799, | |
"text": "solving uh Beyond human performance" | |
}, | |
{ | |
"start": 79.6, | |
"text": "today and each year that we teach this" | |
}, | |
{ | |
"start": 82.52, | |
"text": "course uh this lecture in particular is" | |
}, | |
{ | |
"start": 85.72, | |
"text": "getting harder and harder to teach" | |
}, | |
{ | |
"start": 87.72, | |
"text": "because for an introductory level course" | |
}, | |
{ | |
"start": 90.92, | |
"text": "this lecture lecture number one is the" | |
}, | |
{ | |
"start": 93.28, | |
"text": "lecture that's supposed to cover the" | |
}, | |
{ | |
"start": 94.36, | |
"text": "foundations and if you think to any" | |
}, | |
{ | |
"start": 96.36, | |
"text": "other introductory course like a" | |
}, | |
{ | |
"start": 98.64, | |
"text": "introductory course 101 on mathematics" | |
}, | |
{ | |
"start": 101.36, | |
"text": "or biology those lecture ones don't" | |
}, | |
{ | |
"start": 103.84, | |
"text": "really change that much over time but" | |
}, | |
{ | |
"start": 106.24, | |
"text": "we're in a rapidly changing field of AI" | |
}, | |
{ | |
"start": 108.799, | |
"text": "and deep learning where even these types" | |
}, | |
{ | |
"start": 112.0, | |
"text": "of lectures are rapidly changing so let" | |
}, | |
{ | |
"start": 115.6, | |
"text": "me give you an example of how we" | |
}, | |
{ | |
"start": 117.24, | |
"text": "introduced this course only a few years" | |
}, | |
{ | |
"start": 119.56, | |
"text": "ago" | |
}, | |
{ | |
"start": 121.68, | |
"text": "hi everybody and welcome to MIT 6s" | |
}, | |
{ | |
"start": 126.32, | |
"text": "one91 the official introductory course" | |
}, | |
{ | |
"start": 129.72, | |
"text": "on deep learning taught here at" | |
}, | |
{ | |
"start": 133.44, | |
"text": "MIT deep learning is revolutionizing so" | |
}, | |
{ | |
"start": 137.44, | |
"text": "many fields from robotics to medicine" | |
}, | |
{ | |
"start": 141.28, | |
"text": "and everything in" | |
}, | |
{ | |
"start": 143.2, | |
"text": "between you'll learn the fundamentals of" | |
}, | |
{ | |
"start": 146.599, | |
"text": "this field and how you can build so" | |
}, | |
{ | |
"start": 150.12, | |
"text": "these incredible" | |
}, | |
{ | |
"start": 152.44, | |
"text": "algorithms in fact this entire speech" | |
}, | |
{ | |
"start": 156.319, | |
"text": "and in video are not real and were" | |
}, | |
{ | |
"start": 159.84, | |
"text": "created using deep learning and" | |
}, | |
{ | |
"start": 162.72, | |
"text": "artificial" | |
}, | |
{ | |
"start": 164.8, | |
"text": "intelligence and in this class you'll" | |
}, | |
{ | |
"start": 167.4, | |
"text": "learn how it has been an honor to speak" | |
}, | |
{ | |
"start": 170.92, | |
"text": "with you today and I hope you enjoy the" | |
}, | |
{ | |
"start": 176.92, | |
"text": "course the really surprising thing about" | |
}, | |
{ | |
"start": 180.64, | |
"text": "that video to me when we first did it" | |
}, | |
{ | |
"start": 183.68, | |
"text": "was how viral it went a few years ago so" | |
}, | |
{ | |
"start": 187.04, | |
"text": "just in a couple months of us teaching" | |
}, | |
{ | |
"start": 189.04, | |
"text": "this course a few years ago that video" | |
}, | |
{ | |
"start": 191.08, | |
"text": "went very viral right it got over a" | |
}, | |
{ | |
"start": 193.4, | |
"text": "million views within only a few months" | |
}, | |
{ | |
"start": 196.2, | |
"text": "uh people were shocked with a few things" | |
}, | |
{ | |
"start": 198.599, | |
"text": "but the main one was the realism of AI" | |
}, | |
{ | |
"start": 202.36, | |
"text": "to be able to generate content that" | |
}, | |
{ | |
"start": 205.64, | |
"text": "looks and sounds extremely" | |
}, | |
{ | |
"start": 208.36, | |
"text": "hyperrealistic" | |
}, | |
{ | |
"start": 209.959, | |
"text": "right and when we did this video when we" | |
}, | |
{ | |
"start": 212.239, | |
"text": "created this for the class only a few" | |
}, | |
{ | |
"start": 214.48, | |
"text": "years ago this video took us about" | |
}, | |
{ | |
"start": 217.159, | |
"text": "$10,000 and compute to generate just" | |
}, | |
{ | |
"start": 219.72, | |
"text": "about a minute long video extremely I" | |
}, | |
{ | |
"start": 222.2, | |
"text": "mean if you think about it I would say" | |
}, | |
{ | |
"start": 223.64, | |
"text": "it's extremely expensive to compute" | |
}, | |
{ | |
"start": 225.76, | |
"text": "something what we look at like that and" | |
}, | |
{ | |
"start": 227.84, | |
"text": "maybe a lot of you are not really even" | |
}, | |
{ | |
"start": 229.239, | |
"text": "impressed by the technology today" | |
}, | |
{ | |
"start": 231.159, | |
"text": "because you see all of the amazing" | |
}, | |
{ | |
"start": 232.599, | |
"text": "things that Ai and deep learning are" | |
}, | |
{ | |
"start": 235.439, | |
"text": "producing now fast forward today the" | |
}, | |
{ | |
"start": 238.4, | |
"text": "progress in deep learning yeah and" | |
}, | |
{ | |
"start": 240.2, | |
"text": "people were making all kinds of you know" | |
}, | |
{ | |
"start": 242.72, | |
"text": "exciting remarks about it when it came" | |
}, | |
{ | |
"start": 244.48, | |
"text": "out a few years ago now this is common" | |
}, | |
{ | |
"start": 246.12, | |
"text": "stuff because AI is really uh doing much" | |
}, | |
{ | |
"start": 249.319, | |
"text": "more powerful things than this fun" | |
}, | |
{ | |
"start": 251.76, | |
"text": "little introductory video so today fast" | |
}, | |
{ | |
"start": 255.92, | |
"text": "forward four years about yeah four years" | |
}, | |
{ | |
"start": 259.0, | |
"text": "to today right now where are we AI is" | |
}, | |
{ | |
"start": 261.799, | |
"text": "now generating content with deep" | |
}, | |
{ | |
"start": 264.84, | |
"text": "learning being so commoditized right" | |
}, | |
{ | |
"start": 267.56, | |
"text": "deep learning is in all of our" | |
}, | |
{ | |
"start": 269.039, | |
"text": "fingertips now online in our smartphones" | |
}, | |
{ | |
"start": 272.52, | |
"text": "and so on in fact we can use deep" | |
}, | |
{ | |
"start": 275.6, | |
"text": "learning to generate these types of" | |
}, | |
{ | |
"start": 279.24, | |
"text": "hyperrealistic pieces of media and" | |
}, | |
{ | |
"start": 281.72, | |
"text": "content entirely from English language" | |
}, | |
{ | |
"start": 284.56, | |
"text": "without even coding anymore right so" | |
}, | |
{ | |
"start": 286.8, | |
"text": "before we had to actually go in train" | |
}, | |
{ | |
"start": 288.44, | |
"text": "these models and and really code them to" | |
}, | |
{ | |
"start": 291.24, | |
"text": "be able to create that one minute long" | |
}, | |
{ | |
"start": 293.32, | |
"text": "video today we have models that will do" | |
}, | |
{ | |
"start": 295.88, | |
"text": "that for us end to end directly from" | |
}, | |
{ | |
"start": 298.44, | |
"text": "English language so we can these models" | |
}, | |
{ | |
"start": 300.68, | |
"text": "to create something that the world has" | |
}, | |
{ | |
"start": 302.28, | |
"text": "never seen before a photo of an" | |
}, | |
{ | |
"start": 304.24, | |
"text": "astronaut riding a horse and these" | |
}, | |
{ | |
"start": 306.16, | |
"text": "models can imagine those pieces of" | |
}, | |
{ | |
"start": 308.72, | |
"text": "content entirely from scratch my" | |
}, | |
{ | |
"start": 311.72, | |
"text": "personal favorite is actually how we can" | |
}, | |
{ | |
"start": 313.24, | |
"text": "now ask these deep learning models to uh" | |
}, | |
{ | |
"start": 317.12, | |
"text": "create new types of software even" | |
}, | |
{ | |
"start": 319.36, | |
"text": "themselves being software to ask them to" | |
}, | |
{ | |
"start": 321.72, | |
"text": "create for example to write this piece" | |
}, | |
{ | |
"start": 324.12, | |
"text": "of tensorflow code to train a neural" | |
}, | |
{ | |
"start": 327.199, | |
"text": "network right we're asking a neural" | |
}, | |
{ | |
"start": 328.6, | |
"text": "network to write t flow code to train" | |
}, | |
{ | |
"start": 331.8, | |
"text": "another neural network and our model can" | |
}, | |
{ | |
"start": 333.8, | |
"text": "produce examples of functional and" | |
}, | |
{ | |
"start": 336.68, | |
"text": "usable pieces of code that satisfy this" | |
}, | |
{ | |
"start": 340.44, | |
"text": "English prompt while walking through" | |
}, | |
{ | |
"start": 342.919, | |
"text": "each part of the code independently so" | |
}, | |
{ | |
"start": 344.96, | |
"text": "not even just producing it but actually" | |
}, | |
{ | |
"start": 346.8, | |
"text": "educating and teaching the user on what" | |
}, | |
{ | |
"start": 349.28, | |
"text": "each part of these uh code blocks are" | |
}, | |
{ | |
"start": 351.72, | |
"text": "actually doing you can see example here" | |
}, | |
{ | |
"start": 355.16, | |
"text": "and really what I'm trying to show you" | |
}, | |
{ | |
"start": 356.72, | |
"text": "with all of this is that this is just" | |
}, | |
{ | |
"start": 359.639, | |
"text": "highlighting how far deep learning has" | |
}, | |
{ | |
"start": 362.16, | |
"text": "gone even in a couple years since we've" | |
}, | |
{ | |
"start": 364.84, | |
"text": "started teaching this course I mean" | |
}, | |
{ | |
"start": 367.4, | |
"text": "going back even from before that to" | |
}, | |
{ | |
"start": 369.12, | |
"text": "eight years ago and the most amazing" | |
}, | |
{ | |
"start": 371.68, | |
"text": "thing that you'll see in this course in" | |
}, | |
{ | |
"start": 374.599, | |
"text": "my opinion is that what we try to do" | |
}, | |
{ | |
"start": 377.479, | |
"text": "here is to teach you the foundations of" | |
}, | |
{ | |
"start": 379.44, | |
"text": "all of this how all of these different" | |
}, | |
{ | |
"start": 381.599, | |
"text": "types of models are created from the" | |
}, | |
{ | |
"start": 383.72, | |
"text": "ground up and how we can make all of" | |
}, | |
{ | |
"start": 386.599, | |
"text": "these amazing advances possible so that" | |
}, | |
{ | |
"start": 388.759, | |
"text": "you can also do it on your own as well" | |
}, | |
{ | |
"start": 391.44, | |
"text": "and like I mentioned in the beginning" | |
}, | |
{ | |
"start": 392.72, | |
"text": "this introduction course is getting" | |
}, | |
{ | |
"start": 394.68, | |
"text": "harder and harder to do uh and to make" | |
}, | |
{ | |
"start": 397.84, | |
"text": "every year I don't know where the field" | |
}, | |
{ | |
"start": 399.56, | |
"text": "is going to be next year and I mean" | |
}, | |
{ | |
"start": 402.36, | |
"text": "that's my my honest truth or even" | |
}, | |
{ | |
"start": 405.039, | |
"text": "honestly in even one or two months time" | |
}, | |
{ | |
"start": 407.28, | |
"text": "from now uh just because it's moving so" | |
}, | |
{ | |
"start": 410.28, | |
"text": "incredibly fast but what I do know is" | |
}, | |
{ | |
"start": 412.8, | |
"text": "that uh what we will share with you in" | |
}, | |
{ | |
"start": 414.84, | |
"text": "the course as part of this one week is" | |
}, | |
{ | |
"start": 417.56, | |
"text": "going to be the foundations of all of" | |
}, | |
{ | |
"start": 419.12, | |
"text": "the tech technologies that we have seen" | |
}, | |
{ | |
"start": 421.039, | |
"text": "up until this point that will allow you" | |
}, | |
{ | |
"start": 422.84, | |
"text": "to create that future for yourselves and" | |
}, | |
{ | |
"start": 425.0, | |
"text": "to design brand new types of deep" | |
}, | |
{ | |
"start": 427.039, | |
"text": "learning models uh using those" | |
}, | |
{ | |
"start": 429.599, | |
"text": "fundamentals and those" | |
}, | |
{ | |
"start": 432.44, | |
"text": "foundations so let's get started with" | |
}, | |
{ | |
"start": 435.479, | |
"text": "with all of that and start to figure out" | |
}, | |
{ | |
"start": 437.199, | |
"text": "how we can actually achieve all of these" | |
}, | |
{ | |
"start": 439.52, | |
"text": "different pieces and learn all of these" | |
}, | |
{ | |
"start": 442.319, | |
"text": "different components and we should start" | |
}, | |
{ | |
"start": 444.52, | |
"text": "this by really tackling the foundations" | |
}, | |
{ | |
"start": 447.56, | |
"text": "from the very beginning and asking" | |
}, | |
{ | |
"start": 449.08, | |
"text": "ourselves" | |
}, | |
{ | |
"start": 450.16, | |
"text": "you know we've heard this term I think" | |
}, | |
{ | |
"start": 451.68, | |
"text": "all of you obviously before you've come" | |
}, | |
{ | |
"start": 453.56, | |
"text": "to this class today you've heard the" | |
}, | |
{ | |
"start": 455.0, | |
"text": "term deep learning but it's important" | |
}, | |
{ | |
"start": 456.919, | |
"text": "for you to really understand how this" | |
}, | |
{ | |
"start": 459.12, | |
"text": "concept of deep learning relates to all" | |
}, | |
{ | |
"start": 461.919, | |
"text": "of the other pieces of science that" | |
}, | |
{ | |
"start": 463.879, | |
"text": "you've learned about so far so to do" | |
}, | |
{ | |
"start": 466.52, | |
"text": "that we have to start from the very" | |
}, | |
{ | |
"start": 467.919, | |
"text": "beginning and start by thinking about" | |
}, | |
{ | |
"start": 469.68, | |
"text": "what is intelligence at its core not" | |
}, | |
{ | |
"start": 472.08, | |
"text": "even artificial intelligence but just" | |
}, | |
{ | |
"start": 474.0, | |
"text": "intelligence right so the way I like to" | |
}, | |
{ | |
"start": 476.039, | |
"text": "think about this is that I like to think" | |
}, | |
{ | |
"start": 478.68, | |
"text": "that in elligence is the ability to" | |
}, | |
{ | |
"start": 482.759, | |
"text": "process" | |
}, | |
{ | |
"start": 483.759, | |
"text": "information which will inform your" | |
}, | |
{ | |
"start": 486.08, | |
"text": "future decision-mak" | |
}, | |
{ | |
"start": 487.72, | |
"text": "abilities now that's something that we" | |
}, | |
{ | |
"start": 489.759, | |
"text": "as humans do every single day now" | |
}, | |
{ | |
"start": 492.479, | |
"text": "artificial intelligence is simply the" | |
}, | |
{ | |
"start": 495.08, | |
"text": "ability for us to give computers that" | |
}, | |
{ | |
"start": 497.479, | |
"text": "same ability to process information and" | |
}, | |
{ | |
"start": 500.68, | |
"text": "inform future" | |
}, | |
{ | |
"start": 502.479, | |
"text": "decisions now machine learning is simply" | |
}, | |
{ | |
"start": 505.639, | |
"text": "a subset of artificial intelligence the" | |
}, | |
{ | |
"start": 508.599, | |
"text": "way you should think of machine learning" | |
}, | |
{ | |
"start": 510.72, | |
"text": "is just as the programming ability or" | |
}, | |
{ | |
"start": 513.599, | |
"text": "let's say even simpler than that machine" | |
}, | |
{ | |
"start": 515.479, | |
"text": "learning is the science" | |
}, | |
{ | |
"start": 518.64, | |
"text": "of of trying to teach computers how to" | |
}, | |
{ | |
"start": 522.24, | |
"text": "do that processing of information and" | |
}, | |
{ | |
"start": 524.76, | |
"text": "decision making from data so instead of" | |
}, | |
{ | |
"start": 527.92, | |
"text": "hardcoding some of these rules into" | |
}, | |
{ | |
"start": 529.88, | |
"text": "machines and programming them like we" | |
}, | |
{ | |
"start": 532.16, | |
"text": "used to do in in software engineering" | |
}, | |
{ | |
"start": 534.0, | |
"text": "classes now we're going to try and do" | |
}, | |
{ | |
"start": 536.04, | |
"text": "that processing of information and" | |
}, | |
{ | |
"start": 538.36, | |
"text": "informing a future decision decision" | |
}, | |
{ | |
"start": 539.64, | |
"text": "making abilities directly from data and" | |
}, | |
{ | |
"start": 542.6, | |
"text": "then going one step deeper deep learning" | |
}, | |
{ | |
"start": 544.959, | |
"text": "is simply the subset of machine learning" | |
}, | |
{ | |
"start": 547.24, | |
"text": "which uses neural networks to do that it" | |
}, | |
{ | |
"start": 549.92, | |
"text": "uses neural networks to process raw" | |
}, | |
{ | |
"start": 552.56, | |
"text": "pieces of data now unprocessed data and" | |
}, | |
{ | |
"start": 555.72, | |
"text": "allows them to ingest all of those very" | |
}, | |
{ | |
"start": 558.16, | |
"text": "large data sets and inform future" | |
}, | |
{ | |
"start": 560.56, | |
"text": "decisions now that's exactly what this" | |
}, | |
{ | |
"start": 563.24, | |
"text": "class is is really all about if you" | |
}, | |
{ | |
"start": 565.6, | |
"text": "think of if I had to summarize this" | |
}, | |
{ | |
"start": 567.44, | |
"text": "class in just one line it's about" | |
}, | |
{ | |
"start": 569.76, | |
"text": "teaching machines how to process data" | |
}, | |
{ | |
"start": 572.519, | |
"text": "process information and inform" | |
}, | |
{ | |
"start": 574.959, | |
"text": "decision-mak abilities from that data" | |
}, | |
{ | |
"start": 577.44, | |
"text": "and learn it from that" | |
}, | |
{ | |
"start": 579.64, | |
"text": "data now this program is split between" | |
}, | |
{ | |
"start": 584.079, | |
"text": "really two different parts so you should" | |
}, | |
{ | |
"start": 586.0, | |
"text": "think of this class as being captured" | |
}, | |
{ | |
"start": 588.04, | |
"text": "with both technical lectures which for" | |
}, | |
{ | |
"start": 590.92, | |
"text": "example this is one part of as well as" | |
}, | |
{ | |
"start": 593.56, | |
"text": "software Labs we'll have several new" | |
}, | |
{ | |
"start": 596.04, | |
"text": "updates this year as I mentioned earlier" | |
}, | |
{ | |
"start": 598.12, | |
"text": "just covering the rap changing of" | |
}, | |
{ | |
"start": 600.0, | |
"text": "advances in Ai and especially in some of" | |
}, | |
{ | |
"start": 602.76, | |
"text": "the later lectures you're going to see" | |
}, | |
{ | |
"start": 604.44, | |
"text": "those the first lecture today is going" | |
}, | |
{ | |
"start": 606.839, | |
"text": "to cover the foundations of neural" | |
}, | |
{ | |
"start": 608.88, | |
"text": "networks themselves uh starting with" | |
}, | |
{ | |
"start": 611.64, | |
"text": "really the building blocks of every" | |
}, | |
{ | |
"start": 613.32, | |
"text": "single neural network which is called" | |
}, | |
{ | |
"start": 614.76, | |
"text": "the perceptron and finally we'll go" | |
}, | |
{ | |
"start": 617.399, | |
"text": "through the week and we'll conclude with" | |
}, | |
{ | |
"start": 619.88, | |
"text": "a series of exciting guest lectures from" | |
}, | |
{ | |
"start": 622.72, | |
"text": "industry leading sponsors of the course" | |
}, | |
{ | |
"start": 625.68, | |
"text": "and finally on the software side after" | |
}, | |
{ | |
"start": 629.64, | |
"text": "every lecture you'll also get software" | |
}, | |
{ | |
"start": 632.079, | |
"text": "experience and project building" | |
}, | |
{ | |
"start": 633.839, | |
"text": "experience to be able to take what we" | |
}, | |
{ | |
"start": 635.72, | |
"text": "teach in lectures and actually deploy" | |
}, | |
{ | |
"start": 637.88, | |
"text": "them in real code and and actually" | |
}, | |
{ | |
"start": 640.839, | |
"text": "produce based on the learnings that you" | |
}, | |
{ | |
"start": 643.24, | |
"text": "find in this lecture and at the very end" | |
}, | |
{ | |
"start": 644.959, | |
"text": "of the class from the software side" | |
}, | |
{ | |
"start": 646.92, | |
"text": "you'll have the ability to participate" | |
}, | |
{ | |
"start": 648.839, | |
"text": "in a really fun day at the very end" | |
}, | |
{ | |
"start": 651.32, | |
"text": "which is the project pitch competition" | |
}, | |
{ | |
"start": 653.519, | |
"text": "it's kind of like a shark tank style" | |
}, | |
{ | |
"start": 655.36, | |
"text": "competition of all of the different uh" | |
}, | |
{ | |
"start": 657.639, | |
"text": "projects from all of you and win some" | |
}, | |
{ | |
"start": 659.8, | |
"text": "really awesome prizes so let's step" | |
}, | |
{ | |
"start": 662.24, | |
"text": "through that a little bit briefly this" | |
}, | |
{ | |
"start": 663.6, | |
"text": "is the the syllabus part of the lecture" | |
}, | |
{ | |
"start": 666.72, | |
"text": "so each day we'll have dedicated" | |
}, | |
{ | |
"start": 668.399, | |
"text": "software Labs that will basically mirror" | |
}, | |
{ | |
"start": 671.16, | |
"text": "all of the technical lectures that we go" | |
}, | |
{ | |
"start": 672.92, | |
"text": "through just helping you reinforce your" | |
}, | |
{ | |
"start": 674.48, | |
"text": "learnings and these are coupled with" | |
}, | |
{ | |
"start": 676.8, | |
"text": "each day again coupled with prizes for" | |
}, | |
{ | |
"start": 679.639, | |
"text": "the top performing software solutions" | |
}, | |
{ | |
"start": 681.76, | |
"text": "that are coming up in the class this is" | |
}, | |
{ | |
"start": 683.519, | |
"text": "going to start with today with lab one" | |
}, | |
{ | |
"start": 686.12, | |
"text": "and it's going to be on music generation" | |
}, | |
{ | |
"start": 688.32, | |
"text": "so you're going to learn how to build a" | |
}, | |
{ | |
"start": 689.8, | |
"text": "neural network that can learn from a" | |
}, | |
{ | |
"start": 692.44, | |
"text": "bunch of musical songs listen to them" | |
}, | |
{ | |
"start": 695.76, | |
"text": "and then learn to compose brand new" | |
}, | |
{ | |
"start": 697.76, | |
"text": "songs in that same" | |
}, | |
{ | |
"start": 700.44, | |
"text": "genre tomorrow lab two on computer" | |
}, | |
{ | |
"start": 703.32, | |
"text": "vision you're going to learn about" | |
}, | |
{ | |
"start": 705.639, | |
"text": "facial detection systems you'll build a" | |
}, | |
{ | |
"start": 707.92, | |
"text": "facial detection system from scratch" | |
}, | |
{ | |
"start": 710.279, | |
"text": "using uh convolutional neural networks" | |
}, | |
{ | |
"start": 712.6, | |
"text": "you'll learn what that means tomorrow" | |
}, | |
{ | |
"start": 714.72, | |
"text": "and you'll also learn how to actually" | |
}, | |
{ | |
"start": 716.92, | |
"text": "debias remove the biases that exist in" | |
}, | |
{ | |
"start": 719.76, | |
"text": "some of these facial detection systems" | |
}, | |
{ | |
"start": 721.959, | |
"text": "which is a huge problem for uh the" | |
}, | |
{ | |
"start": 724.079, | |
"text": "state-of-the-art solutions that exist" | |
}, | |
{ | |
"start": 725.839, | |
"text": "today and finally a brand new Lab at the" | |
}, | |
{ | |
"start": 729.2, | |
"text": "end of the course will focus on large" | |
}, | |
{ | |
"start": 731.36, | |
"text": "language models well where you're" | |
}, | |
{ | |
"start": 733.36, | |
"text": "actually going to take a billion" | |
}, | |
{ | |
"start": 735.32, | |
"text": "multi-billion parameter large language" | |
}, | |
{ | |
"start": 737.24, | |
"text": "model and fine-tune it to build an" | |
}, | |
{ | |
"start": 740.279, | |
"text": "assistive chatbot and evaluate a set of" | |
}, | |
{ | |
"start": 743.56, | |
"text": "cognitive abilities ranging from" | |
}, | |
{ | |
"start": 745.079, | |
"text": "mathematics abilities to Scientific" | |
}, | |
{ | |
"start": 746.839, | |
"text": "reasoning to logical abilities and so so" | |
}, | |
{ | |
"start": 750.199, | |
"text": "on and finally at the very very end" | |
}, | |
{ | |
"start": 753.16, | |
"text": "there will be a final project pitch" | |
}, | |
{ | |
"start": 755.24, | |
"text": "competition for up to 5 minutes per team" | |
}, | |
{ | |
"start": 758.92, | |
"text": "and all of these are accompanied with" | |
}, | |
{ | |
"start": 760.92, | |
"text": "great prices so definitely there will be" | |
}, | |
{ | |
"start": 762.959, | |
"text": "a lot of fun to be had throughout the" | |
}, | |
{ | |
"start": 764.32, | |
"text": "week there are many resources to help" | |
}, | |
{ | |
"start": 767.12, | |
"text": "with this class you'll see them posted" | |
}, | |
{ | |
"start": 769.079, | |
"text": "here you don't need to write them down" | |
}, | |
{ | |
"start": 770.32, | |
"text": "because all of the slides are already" | |
}, | |
{ | |
"start": 771.8, | |
"text": "posted online please post to Piaza if" | |
}, | |
{ | |
"start": 774.279, | |
"text": "you have any questions and of course we" | |
}, | |
{ | |
"start": 777.16, | |
"text": "have an amazing team uh that is helping" | |
}, | |
{ | |
"start": 779.959, | |
"text": "teach this course this year and you can" | |
}, | |
{ | |
"start": 782.079, | |
"text": "reach out to any of us if you have any" | |
}, | |
{ | |
"start": 783.88, | |
"text": "questions the Piaza is a great place to" | |
}, | |
{ | |
"start": 785.76, | |
"text": "start myself and AA will be the two main" | |
}, | |
{ | |
"start": 788.8, | |
"text": "lectures for this course uh Monday" | |
}, | |
{ | |
"start": 791.32, | |
"text": "through Wednesday especially and we'll" | |
}, | |
{ | |
"start": 793.079, | |
"text": "also be hearing some amazing guest" | |
}, | |
{ | |
"start": 794.76, | |
"text": "lectures on the second half of the" | |
}, | |
{ | |
"start": 796.88, | |
"text": "course which definitely you would want" | |
}, | |
{ | |
"start": 798.639, | |
"text": "to attend because they they really cover" | |
}, | |
{ | |
"start": 800.88, | |
"text": "the really state-of-the-art sides of" | |
}, | |
{ | |
"start": 803.16, | |
"text": "deep learning uh that's going on in" | |
}, | |
{ | |
"start": 805.24, | |
"text": "Industry outside of" | |
}, | |
{ | |
"start": 807.68, | |
"text": "Academia and very briefly just want to" | |
}, | |
{ | |
"start": 809.959, | |
"text": "give a huge thanks to all of our" | |
}, | |
{ | |
"start": 811.76, | |
"text": "sponsors who without their support this" | |
}, | |
{ | |
"start": 813.88, | |
"text": "course like every year would not be" | |
}, | |
{ | |
"start": 816.279, | |
"text": "possible okay so now let's start with" | |
}, | |
{ | |
"start": 818.519, | |
"text": "the the fun stuff and my favorite part" | |
}, | |
{ | |
"start": 820.8, | |
"text": "of of the course which is the technical" | |
}, | |
{ | |
"start": 822.6, | |
"text": "parts and let's start by just asking" | |
}, | |
{ | |
"start": 824.76, | |
"text": "ourselves a question right which is you" | |
}, | |
{ | |
"start": 828.399, | |
"text": "know why do we care about all of this" | |
}, | |
{ | |
"start": 830.279, | |
"text": "why do we care about deep learning why" | |
}, | |
{ | |
"start": 831.639, | |
"text": "did you all come here today to learn and" | |
}, | |
{ | |
"start": 834.079, | |
"text": "to listen to this" | |
}, | |
{ | |
"start": 835.8, | |
"text": "course so to understand I think we again" | |
}, | |
{ | |
"start": 838.72, | |
"text": "need to go back a little bit to" | |
}, | |
{ | |
"start": 840.88, | |
"text": "understand how machine learning used to" | |
}, | |
{ | |
"start": 842.68, | |
"text": "be uh performed right so machine" | |
}, | |
{ | |
"start": 845.48, | |
"text": "learning typically would Define a set of" | |
}, | |
{ | |
"start": 849.24, | |
"text": "features or you can think of these as" | |
}, | |
{ | |
"start": 850.92, | |
"text": "kind of a set of things to look for in" | |
}, | |
{ | |
"start": 853.839, | |
"text": "an image or in a piece of data usually" | |
}, | |
{ | |
"start": 856.44, | |
"text": "these are hand engineered so humans" | |
}, | |
{ | |
"start": 858.639, | |
"text": "would have to Define these themselves" | |
}, | |
{ | |
"start": 861.24, | |
"text": "and the problem with these is that they" | |
}, | |
{ | |
"start": 862.759, | |
"text": "tend to be very brittle in practice just" | |
}, | |
{ | |
"start": 865.279, | |
"text": "by nature of a human defining them so" | |
}, | |
{ | |
"start": 867.519, | |
"text": "the key idea of keep learning and what" | |
}, | |
{ | |
"start": 869.8, | |
"text": "you're going to learn throughout this" | |
}, | |
{ | |
"start": 871.079, | |
"text": "entire week is this Paradigm Shift of" | |
}, | |
{ | |
"start": 873.56, | |
"text": "trying to move away from hand" | |
}, | |
{ | |
"start": 875.199, | |
"text": "engineering features and rules that" | |
}, | |
{ | |
"start": 877.839, | |
"text": "computer should look for and instead" | |
}, | |
{ | |
"start": 879.72, | |
"text": "trying to learn them directly from raw" | |
}, | |
{ | |
"start": 882.72, | |
"text": "pieces of data so what are the patterns" | |
}, | |
{ | |
"start": 885.639, | |
"text": "that we need to look at in data sets" | |
}, | |
{ | |
"start": 888.399, | |
"text": "such that if we look at those patterns" | |
}, | |
{ | |
"start": 890.44, | |
"text": "we can make some interesting decisions" | |
}, | |
{ | |
"start": 892.36, | |
"text": "and interesting actions can come out so" | |
}, | |
{ | |
"start": 894.88, | |
"text": "for example if we wanted to learn how to" | |
}, | |
{ | |
"start": 897.12, | |
"text": "detect faces we might if you think even" | |
}, | |
{ | |
"start": 900.16, | |
"text": "how you would detect faces right if you" | |
}, | |
{ | |
"start": 901.8, | |
"text": "look at a picture what are you looking" | |
}, | |
{ | |
"start": 903.279, | |
"text": "for to detect a face you're looking for" | |
}, | |
{ | |
"start": 905.16, | |
"text": "some particular patterns you're looking" | |
}, | |
{ | |
"start": 907.0, | |
"text": "for eyes and noses and ears and when" | |
}, | |
{ | |
"start": 909.639, | |
"text": "those things are all composed in a" | |
}, | |
{ | |
"start": 911.16, | |
"text": "certain way you would probably deduce" | |
}, | |
{ | |
"start": 913.16, | |
"text": "that that's a face right computers do" | |
}, | |
{ | |
"start": 915.6, | |
"text": "something very similar so they have to" | |
}, | |
{ | |
"start": 917.88, | |
"text": "understand what are the patterns that" | |
}, | |
{ | |
"start": 919.6, | |
"text": "they look for what are the eyes and" | |
}, | |
{ | |
"start": 921.24, | |
"text": "noses and ears of those pieces of data" | |
}, | |
{ | |
"start": 924.48, | |
"text": "and then from there actually detect and" | |
}, | |
{ | |
"start": 927.8, | |
"text": "predict from them" | |
}, | |
{ | |
"start": 930.959, | |
"text": "so the really interesting thing I think" | |
}, | |
{ | |
"start": 934.12, | |
"text": "about deep learning is that these" | |
}, | |
{ | |
"start": 936.12, | |
"text": "foundations for doing exactly what I" | |
}, | |
{ | |
"start": 938.44, | |
"text": "just mentioned picking out the building" | |
}, | |
{ | |
"start": 940.6, | |
"text": "blocks picking out the features from raw" | |
}, | |
{ | |
"start": 943.04, | |
"text": "pieces of data and the underlying" | |
}, | |
{ | |
"start": 945.199, | |
"text": "algorithms themselves have existed for" | |
}, | |
{ | |
"start": 947.6, | |
"text": "many many decades now the question I" | |
}, | |
{ | |
"start": 952.199, | |
"text": "would ask at this point is so why are we" | |
}, | |
{ | |
"start": 954.639, | |
"text": "studying this now and why is all of this" | |
}, | |
{ | |
"start": 956.519, | |
"text": "really blowing up right now and" | |
}, | |
{ | |
"start": 958.16, | |
"text": "exploding with so many great advances" | |
}, | |
{ | |
"start": 960.44, | |
"text": "well for one there's three things right" | |
}, | |
{ | |
"start": 962.639, | |
"text": "number one is that the data that is" | |
}, | |
{ | |
"start": 964.56, | |
"text": "available to us today is significantly" | |
}, | |
{ | |
"start": 967.839, | |
"text": "more pervasive these models are hungry" | |
}, | |
{ | |
"start": 970.199, | |
"text": "for data you're going to learn about" | |
}, | |
{ | |
"start": 971.68, | |
"text": "this more in detail but these models are" | |
}, | |
{ | |
"start": 973.759, | |
"text": "extremely hungry for data and we're" | |
}, | |
{ | |
"start": 975.92, | |
"text": "living in a world right now quite" | |
}, | |
{ | |
"start": 978.88, | |
"text": "frankly where data is more abundant than" | |
}, | |
{ | |
"start": 981.0, | |
"text": "it has ever been in our history now" | |
}, | |
{ | |
"start": 983.959, | |
"text": "secondly these algorithms are massively" | |
}, | |
{ | |
"start": 986.88, | |
"text": "compute hungry they're and they're" | |
}, | |
{ | |
"start": 988.36, | |
"text": "massively parallelizable which means" | |
}, | |
{ | |
"start": 990.6, | |
"text": "that they have greatly benefited from" | |
}, | |
{ | |
"start": 993.72, | |
"text": "compute Hardware which is also capable" | |
}, | |
{ | |
"start": 996.12, | |
"text": "of being parallelized the particular" | |
}, | |
{ | |
"start": 999.319, | |
"text": "name of that Hardware is called a GPU" | |
}, | |
{ | |
"start": 1001.68, | |
"text": "right gpus can run parallel processing" | |
}, | |
{ | |
"start": 1004.6, | |
"text": "uh streams of information and are" | |
}, | |
{ | |
"start": 1007.0, | |
"text": "particularly amenable to deep learning" | |
}, | |
{ | |
"start": 1008.8, | |
"text": "algorithms and the abundance of gpus and" | |
}, | |
{ | |
"start": 1011.279, | |
"text": "that compute Hardware has also push" | |
}, | |
{ | |
"start": 1013.639, | |
"text": "forward what we can do in deep learning" | |
}, | |
{ | |
"start": 1016.519, | |
"text": "and finally the last piece is the" | |
}, | |
{ | |
"start": 1018.44, | |
"text": "software" | |
}, | |
{ | |
"start": 1019.399, | |
"text": "right it's the open source tools that" | |
}, | |
{ | |
"start": 1021.639, | |
"text": "are really used as the foundational" | |
}, | |
{ | |
"start": 1024.52, | |
"text": "building blocks of deploying and" | |
}, | |
{ | |
"start": 1026.88, | |
"text": "building all of these underlying models" | |
}, | |
{ | |
"start": 1028.919, | |
"text": "that you're going to learn about in this" | |
}, | |
{ | |
"start": 1030.28, | |
"text": "course and those open source tools have" | |
}, | |
{ | |
"start": 1032.0, | |
"text": "just become extremely streamlined making" | |
}, | |
{ | |
"start": 1034.24, | |
"text": "this extremely easy for all of us to" | |
}, | |
{ | |
"start": 1037.16, | |
"text": "learn about these Technologies within an" | |
}, | |
{ | |
"start": 1039.24, | |
"text": "amazing onewe course like" | |
}, | |
{ | |
"start": 1041.52, | |
"text": "this so let's start now with" | |
}, | |
{ | |
"start": 1044.12, | |
"text": "understanding now that we have some of" | |
}, | |
{ | |
"start": 1045.439, | |
"text": "the background let's start with" | |
}, | |
{ | |
"start": 1046.88, | |
"text": "understanding exactly what is the" | |
}, | |
{ | |
"start": 1048.96, | |
"text": "fundamental building block of a neural" | |
}, | |
{ | |
"start": 1051.28, | |
"text": "network now that building block is" | |
}, | |
{ | |
"start": 1054.12, | |
"text": "called a perceptron right every single" | |
}, | |
{ | |
"start": 1056.96, | |
"text": "perceptor every single neural network is" | |
}, | |
{ | |
"start": 1058.96, | |
"text": "built up of multiple perceptrons and" | |
}, | |
{ | |
"start": 1061.919, | |
"text": "you're going to learn how those" | |
}, | |
{ | |
"start": 1063.48, | |
"text": "perceptrons number one compute" | |
}, | |
{ | |
"start": 1065.16, | |
"text": "information themselves and how they" | |
}, | |
{ | |
"start": 1066.64, | |
"text": "connect to these much larger billion" | |
}, | |
{ | |
"start": 1069.24, | |
"text": "parameter neural" | |
}, | |
{ | |
"start": 1071.2, | |
"text": "networks so the key idea of a perceptron" | |
}, | |
{ | |
"start": 1074.4, | |
"text": "or even simpler think of this as a" | |
}, | |
{ | |
"start": 1076.28, | |
"text": "single neuron right so a neural network" | |
}, | |
{ | |
"start": 1078.28, | |
"text": "is composed osed of many many neurons" | |
}, | |
{ | |
"start": 1080.72, | |
"text": "and a perceptron is just one neuron so" | |
}, | |
{ | |
"start": 1083.48, | |
"text": "that idea of a perceptron is actually" | |
}, | |
{ | |
"start": 1085.6, | |
"text": "extremely simple and I hope that by the" | |
}, | |
{ | |
"start": 1087.12, | |
"text": "end of today this idea and this uh" | |
}, | |
{ | |
"start": 1090.72, | |
"text": "processing of a perceptron becomes" | |
}, | |
{ | |
"start": 1092.88, | |
"text": "extremely clear to you so let's start by" | |
}, | |
{ | |
"start": 1095.159, | |
"text": "talking about just the forward" | |
}, | |
{ | |
"start": 1096.96, | |
"text": "propagation of information through a" | |
}, | |
{ | |
"start": 1099.28, | |
"text": "single neuron now single neurons ingest" | |
}, | |
{ | |
"start": 1102.799, | |
"text": "information they can actually ingest" | |
}, | |
{ | |
"start": 1105.08, | |
"text": "multiple pieces of information so here" | |
}, | |
{ | |
"start": 1107.24, | |
"text": "you can see this neuron taking has input" | |
}, | |
{ | |
"start": 1109.48, | |
"text": "three pieces of information X1 X2 and" | |
}, | |
{ | |
"start": 1112.88, | |
"text": "XM right so we Define the set of inputs" | |
}, | |
{ | |
"start": 1116.4, | |
"text": "called x 1 through M and each of these" | |
}, | |
{ | |
"start": 1119.6, | |
"text": "inputs each of these numbers is going to" | |
}, | |
{ | |
"start": 1121.679, | |
"text": "be elementwise multiplied by a" | |
}, | |
{ | |
"start": 1124.12, | |
"text": "particular weight so this is going to be" | |
}, | |
{ | |
"start": 1126.4, | |
"text": "denoted here by W1 through WM so this is" | |
}, | |
{ | |
"start": 1129.24, | |
"text": "a corresponding weight for every single" | |
}, | |
{ | |
"start": 1130.96, | |
"text": "input and you should think of this as" | |
}, | |
{ | |
"start": 1132.6, | |
"text": "really uh you know every weight being" | |
}, | |
{ | |
"start": 1134.96, | |
"text": "assigned to that input right the weights" | |
}, | |
{ | |
"start": 1137.96, | |
"text": "are part of the neuron itself now you" | |
}, | |
{ | |
"start": 1141.32, | |
"text": "multiply all of these inputs with their" | |
}, | |
{ | |
"start": 1143.32, | |
"text": "weights together and then you add them" | |
}, | |
{ | |
"start": 1144.88, | |
"text": "up we take this single number after that" | |
}, | |
{ | |
"start": 1147.559, | |
"text": "addition and you pass it through what's" | |
}, | |
{ | |
"start": 1149.679, | |
"text": "called a nonlinear activation function" | |
}, | |
{ | |
"start": 1152.12, | |
"text": "to produce your final output which here" | |
}, | |
{ | |
"start": 1154.039, | |
"text": "be calling" | |
}, | |
{ | |
"start": 1158.159, | |
"text": "y now what I just said is not entirely" | |
}, | |
{ | |
"start": 1161.84, | |
"text": "correct right so I missed out one" | |
}, | |
{ | |
"start": 1163.799, | |
"text": "critical piece of information that piece" | |
}, | |
{ | |
"start": 1165.52, | |
"text": "of information is that we also have what" | |
}, | |
{ | |
"start": 1167.559, | |
"text": "you can see here is called this bias" | |
}, | |
{ | |
"start": 1169.6, | |
"text": "term that bias term is actually what" | |
}, | |
{ | |
"start": 1172.6, | |
"text": "allows your neuron neuron to shift its" | |
}, | |
{ | |
"start": 1176.159, | |
"text": "activation function horizontally on that" | |
}, | |
{ | |
"start": 1178.679, | |
"text": "x axis if you think of it right so on" | |
}, | |
{ | |
"start": 1182.12, | |
"text": "the right side you can now see this" | |
}, | |
{ | |
"start": 1183.799, | |
"text": "diagram illustrating mathematically that" | |
}, | |
{ | |
"start": 1186.48, | |
"text": "single equation that I talked through" | |
}, | |
{ | |
"start": 1188.559, | |
"text": "kind of conceptually right now you can" | |
}, | |
{ | |
"start": 1190.159, | |
"text": "see it mathematically written down as" | |
}, | |
{ | |
"start": 1191.96, | |
"text": "one single equation and we can actually" | |
}, | |
{ | |
"start": 1194.28, | |
"text": "rewrite this using linear algebra using" | |
}, | |
{ | |
"start": 1196.96, | |
"text": "vectors and Dot products so let's do" | |
}, | |
{ | |
"start": 1199.28, | |
"text": "that right so now our inputs are going" | |
}, | |
{ | |
"start": 1200.919, | |
"text": "to be described by a capital x which is" | |
}, | |
{ | |
"start": 1203.96, | |
"text": "simply a vector of all of our inputs X1" | |
}, | |
{ | |
"start": 1206.84, | |
"text": "through XM and then our weights are" | |
}, | |
{ | |
"start": 1209.44, | |
"text": "going to be described by a capital W" | |
}, | |
{ | |
"start": 1212.12, | |
"text": "which is going to be uh W1 through WM" | |
}, | |
{ | |
"start": 1215.84, | |
"text": "the input is obtained by taking the dot" | |
}, | |
{ | |
"start": 1218.159, | |
"text": "product of X and W right that dot" | |
}, | |
{ | |
"start": 1221.799, | |
"text": "product does that element wise" | |
}, | |
{ | |
"start": 1223.08, | |
"text": "multiplication and then adds sums all of" | |
}, | |
{ | |
"start": 1226.0, | |
"text": "the the element wise multiplications and" | |
}, | |
{ | |
"start": 1228.48, | |
"text": "then here's the missing piece is that" | |
}, | |
{ | |
"start": 1230.36, | |
"text": "we're now going to add that bias term" | |
}, | |
{ | |
"start": 1232.799, | |
"text": "here we're calling the bias term" | |
}, | |
{ | |
"start": 1234.72, | |
"text": "w0 right and then we're going to apply" | |
}, | |
{ | |
"start": 1236.919, | |
"text": "the nonlinearity which here denoted as Z" | |
}, | |
{ | |
"start": 1239.52, | |
"text": "or G excuse me so I've mentioned this" | |
}, | |
{ | |
"start": 1242.84, | |
"text": "nonlinearity a few times this activation" | |
}, | |
{ | |
"start": 1245.039, | |
"text": "function let's dig into it a little bit" | |
}, | |
{ | |
"start": 1247.039, | |
"text": "more so we can understand what is" | |
}, | |
{ | |
"start": 1248.88, | |
"text": "actually this activation function doing" | |
}, | |
{ | |
"start": 1251.48, | |
"text": "well I said a couple things about it I" | |
}, | |
{ | |
"start": 1253.36, | |
"text": "said it's a nonlinear function right" | |
}, | |
{ | |
"start": 1255.679, | |
"text": "here you can see one example of an" | |
}, | |
{ | |
"start": 1257.96, | |
"text": "activation fun function one common uh" | |
}, | |
{ | |
"start": 1261.24, | |
"text": "one commonly used activation function is" | |
}, | |
{ | |
"start": 1263.96, | |
"text": "called the sigmoid function which you" | |
}, | |
{ | |
"start": 1265.72, | |
"text": "can actually see here on the bottom" | |
}, | |
{ | |
"start": 1267.159, | |
"text": "right hand side of the screen the" | |
}, | |
{ | |
"start": 1268.919, | |
"text": "sigmoid function is very commonly used" | |
}, | |
{ | |
"start": 1271.679, | |
"text": "because it's outputs right so it takes" | |
}, | |
{ | |
"start": 1274.039, | |
"text": "as input any real number the x- axxis is" | |
}, | |
{ | |
"start": 1276.559, | |
"text": "infinite plus or minus but on the Y AIS" | |
}, | |
{ | |
"start": 1280.039, | |
"text": "it basically squashes every input X into" | |
}, | |
{ | |
"start": 1284.4, | |
"text": "a number between Z and one so it's" | |
}, | |
{ | |
"start": 1286.48, | |
"text": "actually a very common choice for things" | |
}, | |
{ | |
"start": 1288.24, | |
"text": "like probability distributions if you" | |
}, | |
{ | |
"start": 1290.0, | |
"text": "want to convert your answers into" | |
}, | |
{ | |
"start": 1291.559, | |
"text": "probabilities or learn or teach a neuron" | |
}, | |
{ | |
"start": 1294.32, | |
"text": "to learn a probability" | |
}, | |
{ | |
"start": 1296.44, | |
"text": "distribution but in fact there are" | |
}, | |
{ | |
"start": 1298.52, | |
"text": "actually many different types of" | |
}, | |
{ | |
"start": 1299.88, | |
"text": "nonlinear activation functions that are" | |
}, | |
{ | |
"start": 1302.24, | |
"text": "used in neural networks and here are" | |
}, | |
{ | |
"start": 1303.919, | |
"text": "some common ones and and again" | |
}, | |
{ | |
"start": 1305.4, | |
"text": "throughout this presentation you'll see" | |
}, | |
{ | |
"start": 1307.4, | |
"text": "these little tensorflow icons actually" | |
}, | |
{ | |
"start": 1309.84, | |
"text": "throughout the entire course you'll see" | |
}, | |
{ | |
"start": 1311.039, | |
"text": "these tensorflow icons on the bottom" | |
}, | |
{ | |
"start": 1313.12, | |
"text": "which basically just allow you to uh" | |
}, | |
{ | |
"start": 1315.919, | |
"text": "relate some of the foundational" | |
}, | |
{ | |
"start": 1317.64, | |
"text": "knowledge that we're teaching ing in the" | |
}, | |
{ | |
"start": 1319.36, | |
"text": "lectures to some of the software labs" | |
}, | |
{ | |
"start": 1321.48, | |
"text": "and this might provide a good starting" | |
}, | |
{ | |
"start": 1323.12, | |
"text": "point for a lot of the pieces that you" | |
}, | |
{ | |
"start": 1324.559, | |
"text": "have to do later on in the software" | |
}, | |
{ | |
"start": 1326.76, | |
"text": "parts of the class so the sigmoid" | |
}, | |
{ | |
"start": 1329.4, | |
"text": "activation which we talked about in the" | |
}, | |
{ | |
"start": 1331.0, | |
"text": "last slide here it's shown on the left" | |
}, | |
{ | |
"start": 1332.48, | |
"text": "hand side right this is very popular" | |
}, | |
{ | |
"start": 1334.679, | |
"text": "because of the probability distributions" | |
}, | |
{ | |
"start": 1336.32, | |
"text": "right it squashes everything between" | |
}, | |
{ | |
"start": 1337.679, | |
"text": "zero and one but you see two other uh" | |
}, | |
{ | |
"start": 1340.48, | |
"text": "very common types of activation" | |
}, | |
{ | |
"start": 1342.64, | |
"text": "functions in the middle and the right" | |
}, | |
{ | |
"start": 1344.32, | |
"text": "hand side as well so the other very very" | |
}, | |
{ | |
"start": 1347.039, | |
"text": "common one probably the this is the one" | |
}, | |
{ | |
"start": 1349.08, | |
"text": "now that's the most popular activation" | |
}, | |
{ | |
"start": 1350.84, | |
"text": "function is now on the far right hand" | |
}, | |
{ | |
"start": 1352.64, | |
"text": "side it's called the relu activation" | |
}, | |
{ | |
"start": 1354.919, | |
"text": "function or also called the rectified" | |
}, | |
{ | |
"start": 1356.72, | |
"text": "linear unit so basically it's linear" | |
}, | |
{ | |
"start": 1359.08, | |
"text": "everywhere except there's a nonlinearity" | |
}, | |
{ | |
"start": 1361.279, | |
"text": "at x equals z so there's a kind of a" | |
}, | |
{ | |
"start": 1364.039, | |
"text": "step or a break discontinuity right so" | |
}, | |
{ | |
"start": 1366.96, | |
"text": "benefit of this very easy to compute it" | |
}, | |
{ | |
"start": 1369.44, | |
"text": "still has the nonlinearity which we kind" | |
}, | |
{ | |
"start": 1371.44, | |
"text": "of need and we'll talk about why we need" | |
}, | |
{ | |
"start": 1372.96, | |
"text": "it in one second but it's very fast" | |
}, | |
{ | |
"start": 1375.72, | |
"text": "right just two linear functions" | |
}, | |
{ | |
"start": 1377.32, | |
"text": "piecewise combined with each" | |
}, | |
{ | |
"start": 1379.44, | |
"text": "other okay so now let's talk about why" | |
}, | |
{ | |
"start": 1381.72, | |
"text": "we need a nonlinearity in the first" | |
}, | |
{ | |
"start": 1383.72, | |
"text": "place why why not just deal with a" | |
}, | |
{ | |
"start": 1386.12, | |
"text": "linear function that we pass all of" | |
}, | |
{ | |
"start": 1387.679, | |
"text": "these inputs through so the point of the" | |
}, | |
{ | |
"start": 1390.039, | |
"text": "activation function even at all why do" | |
}, | |
{ | |
"start": 1392.799, | |
"text": "we have this is to introduce" | |
}, | |
{ | |
"start": 1395.279, | |
"text": "nonlinearities in of itself so what we" | |
}, | |
{ | |
"start": 1398.6, | |
"text": "want to do is to allow our neural" | |
}, | |
{ | |
"start": 1401.2, | |
"text": "network to deal with nonlinear data" | |
}, | |
{ | |
"start": 1404.64, | |
"text": "right our neural networks need the" | |
}, | |
{ | |
"start": 1406.76, | |
"text": "ability to deal with nonlinear data" | |
}, | |
{ | |
"start": 1408.72, | |
"text": "because the world is extremely nonlinear" | |
}, | |
{ | |
"start": 1412.4, | |
"text": "right this is important because you know" | |
}, | |
{ | |
"start": 1414.559, | |
"text": "if you think of the real world real data" | |
}, | |
{ | |
"start": 1416.679, | |
"text": "sets this is just the way they are right" | |
}, | |
{ | |
"start": 1419.4, | |
"text": "if you look at data sets like this one" | |
}, | |
{ | |
"start": 1421.24, | |
"text": "green and red points right and I ask you" | |
}, | |
{ | |
"start": 1423.279, | |
"text": "to build a neural network that can" | |
}, | |
{ | |
"start": 1425.76, | |
"text": "separate the green and the red points" | |
}, | |
{ | |
"start": 1428.559, | |
"text": "this means that we actually need a" | |
}, | |
{ | |
"start": 1431.2, | |
"text": "nonlinear function to do that we cannot" | |
}, | |
{ | |
"start": 1432.96, | |
"text": "solve this problem with a single line" | |
}, | |
{ | |
"start": 1435.88, | |
"text": "right in fact if we used linear uh" | |
}, | |
{ | |
"start": 1439.559, | |
"text": "linear functions as your activation" | |
}, | |
{ | |
"start": 1441.679, | |
"text": "function no matter how big your neural" | |
}, | |
{ | |
"start": 1443.72, | |
"text": "network is it's still a linear function" | |
}, | |
{ | |
"start": 1445.919, | |
"text": "because linear functions combined with" | |
}, | |
{ | |
"start": 1447.36, | |
"text": "linear functions are still linear so no" | |
}, | |
{ | |
"start": 1449.96, | |
"text": "matter how deep or how many parameters" | |
}, | |
{ | |
"start": 1451.72, | |
"text": "your neural network has the best they" | |
}, | |
{ | |
"start": 1453.64, | |
"text": "would be able to do to separate these" | |
}, | |
{ | |
"start": 1455.24, | |
"text": "green and red points would look like" | |
}, | |
{ | |
"start": 1456.679, | |
"text": "this but adding nonlinearities allows" | |
}, | |
{ | |
"start": 1459.64, | |
"text": "our neural networks to be smaller by" | |
}, | |
{ | |
"start": 1462.48, | |
"text": "allowing them to be more expressive and" | |
}, | |
{ | |
"start": 1464.64, | |
"text": "capture more complexities in the data" | |
}, | |
{ | |
"start": 1466.919, | |
"text": "sets and this allows them to be much" | |
}, | |
{ | |
"start": 1468.6, | |
"text": "more powerful in the end so let's" | |
}, | |
{ | |
"start": 1472.12, | |
"text": "understand this with a simple example" | |
}, | |
{ | |
"start": 1474.0, | |
"text": "imagine I give you now this trained" | |
}, | |
{ | |
"start": 1475.76, | |
"text": "neural network so what does it mean" | |
}, | |
{ | |
"start": 1476.96, | |
"text": "trained neural network it means now I'm" | |
}, | |
{ | |
"start": 1478.44, | |
"text": "giving you the weights right not only" | |
}, | |
{ | |
"start": 1480.52, | |
"text": "the inputs but I'm going to tell you" | |
}, | |
{ | |
"start": 1482.279, | |
"text": "what the weights of this neural network" | |
}, | |
{ | |
"start": 1483.64, | |
"text": "are so here let's say the bias term w0" | |
}, | |
{ | |
"start": 1487.279, | |
"text": "is going to be one and our W Vector is" | |
}, | |
{ | |
"start": 1490.799, | |
"text": "going to be 3 and ne2 right these are" | |
}, | |
{ | |
"start": 1493.76, | |
"text": "just the weights of your train neural" | |
}, | |
{ | |
"start": 1494.96, | |
"text": "network let's worry about how we got" | |
}, | |
{ | |
"start": 1496.679, | |
"text": "those weights in a second but this" | |
}, | |
{ | |
"start": 1498.799, | |
"text": "network has two inputs X1 and X2 now if" | |
}, | |
{ | |
"start": 1503.36, | |
"text": "we want to get the output of this neural" | |
}, | |
{ | |
"start": 1505.88, | |
"text": "network all we have to do simply is to" | |
}, | |
{ | |
"start": 1508.52, | |
"text": "do the same story that we talked about" | |
}, | |
{ | |
"start": 1510.12, | |
"text": "before right it's dot" | |
}, | |
{ | |
"start": 1512.919, | |
"text": "product inputs with weights add the bias" | |
}, | |
{ | |
"start": 1517.48, | |
"text": "and apply the nonlinearity right and" | |
}, | |
{ | |
"start": 1519.24, | |
"text": "those are the three components that you" | |
}, | |
{ | |
"start": 1520.72, | |
"text": "really have to remember as part of this" | |
}, | |
{ | |
"start": 1522.64, | |
"text": "class right dot product uh add the bias" | |
}, | |
{ | |
"start": 1526.64, | |
"text": "and apply a nonlinearity that's going to" | |
}, | |
{ | |
"start": 1528.799, | |
"text": "be the process that keeps repeating over" | |
}, | |
{ | |
"start": 1530.48, | |
"text": "and over and over again for every single" | |
}, | |
{ | |
"start": 1532.799, | |
"text": "neuron after that happens that neuron" | |
}, | |
{ | |
"start": 1535.679, | |
"text": "was going to Output a single number" | |
}, | |
{ | |
"start": 1538.24, | |
"text": "right now let's take a look at what's" | |
}, | |
{ | |
"start": 1540.159, | |
"text": "inside of that nonlinearity it's simply" | |
}, | |
{ | |
"start": 1542.88, | |
"text": "a weighted combination of those uh of" | |
}, | |
{ | |
"start": 1547.399, | |
"text": "those inputs with those weights right so" | |
}, | |
{ | |
"start": 1549.24, | |
"text": "if we look at what's inside of G right" | |
}, | |
{ | |
"start": 1552.399, | |
"text": "inside of G is a weighted combination of" | |
}, | |
{ | |
"start": 1554.72, | |
"text": "X and" | |
}, | |
{ | |
"start": 1555.72, | |
"text": "W right added with a bias" | |
}, | |
{ | |
"start": 1558.919, | |
"text": "right that's going to produce a single" | |
}, | |
{ | |
"start": 1561.52, | |
"text": "number right but in reality for any" | |
}, | |
{ | |
"start": 1564.12, | |
"text": "input that this model could see what" | |
}, | |
{ | |
"start": 1566.48, | |
"text": "this really is is a two-dimensional line" | |
}, | |
{ | |
"start": 1568.52, | |
"text": "because we have two parameters in this" | |
}, | |
{ | |
"start": 1571.039, | |
"text": "model so we can actually plot that line" | |
}, | |
{ | |
"start": 1574.12, | |
"text": "we can see exactly how this neuron" | |
}, | |
{ | |
"start": 1578.0, | |
"text": "separates points on these axes between" | |
}, | |
{ | |
"start": 1581.32, | |
"text": "X1 and X2 right these are the two inputs" | |
}, | |
{ | |
"start": 1583.84, | |
"text": "of this model we can see exactly and" | |
}, | |
{ | |
"start": 1586.559, | |
"text": "interpret exactly what this neuron is is" | |
}, | |
{ | |
"start": 1588.48, | |
"text": "doing right we can visualize its entire" | |
}, | |
{ | |
"start": 1590.679, | |
"text": "space because we can plot the line that" | |
}, | |
{ | |
"start": 1593.0, | |
"text": "defines this neuron right so here we're" | |
}, | |
{ | |
"start": 1595.559, | |
"text": "plotting when that line equals" | |
}, | |
{ | |
"start": 1597.72, | |
"text": "zero and in fact if I give you if I give" | |
}, | |
{ | |
"start": 1601.279, | |
"text": "that neuron in fact a new data point" | |
}, | |
{ | |
"start": 1603.72, | |
"text": "here the new data point is X1 = -1 and" | |
}, | |
{ | |
"start": 1606.559, | |
"text": "X2 = 2 just an arbitrary point in this" | |
}, | |
{ | |
"start": 1609.2, | |
"text": "two-dimensional space we can plot that" | |
}, | |
{ | |
"start": 1611.32, | |
"text": "point in the two-dimensional space And" | |
}, | |
{ | |
"start": 1613.24, | |
"text": "depending on which side of the line it" | |
}, | |
{ | |
"start": 1615.0, | |
"text": "falls on it tells us you know what the" | |
}, | |
{ | |
"start": 1618.36, | |
"text": "what the answer is going to be what the" | |
}, | |
{ | |
"start": 1619.919, | |
"text": "sign of the answer is going to be and" | |
}, | |
{ | |
"start": 1622.0, | |
"text": "also what the answer itself is going to" | |
}, | |
{ | |
"start": 1623.799, | |
"text": "be right so if we follow that that" | |
}, | |
{ | |
"start": 1625.96, | |
"text": "equation written on the top here and" | |
}, | |
{ | |
"start": 1627.88, | |
"text": "plug in -1 and 2 we're going to get 1 -" | |
}, | |
{ | |
"start": 1631.279, | |
"text": "3 - 4 which equal" | |
}, | |
{ | |
"start": 1634.44, | |
"text": "-6 right and when I put that into my" | |
}, | |
{ | |
"start": 1637.36, | |
"text": "nonlinearity G I'm going to get a final" | |
}, | |
{ | |
"start": 1640.559, | |
"text": "output of" | |
}, | |
{ | |
"start": 1643.12, | |
"text": "0.2 right so that that don't worry about" | |
}, | |
{ | |
"start": 1645.64, | |
"text": "the final output that's just going to be" | |
}, | |
{ | |
"start": 1647.039, | |
"text": "the output for that signal function but" | |
}, | |
{ | |
"start": 1649.52, | |
"text": "the important point to remember here is" | |
}, | |
{ | |
"start": 1651.88, | |
"text": "that the sigmoid function actually" | |
}, | |
{ | |
"start": 1653.52, | |
"text": "divides the space into these two parts" | |
}, | |
{ | |
"start": 1656.799, | |
"text": "right it squashes everything between Z" | |
}, | |
{ | |
"start": 1659.08, | |
"text": "and one but it divides it implicitly by" | |
}, | |
{ | |
"start": 1662.279, | |
"text": "everything less than 0.5 and greater" | |
}, | |
{ | |
"start": 1665.159, | |
"text": "than 0.5 depending on if it's on if x is" | |
}, | |
{ | |
"start": 1668.279, | |
"text": "less than zero or greater than zero so" | |
}, | |
{ | |
"start": 1671.159, | |
"text": "depending on which side of the line that" | |
}, | |
{ | |
"start": 1673.08, | |
"text": "you fall on remember the line is when x" | |
}, | |
{ | |
"start": 1675.76, | |
"text": "equals z the input to the sigmoid is" | |
}, | |
{ | |
"start": 1677.64, | |
"text": "zero if you fall on the left side of the" | |
}, | |
{ | |
"start": 1680.159, | |
"text": "line your output will be less than 0.5" | |
}, | |
{ | |
"start": 1684.08, | |
"text": "because you're falling on the negative" | |
}, | |
{ | |
"start": 1685.72, | |
"text": "side of the line if your output is if" | |
}, | |
{ | |
"start": 1688.2, | |
"text": "your input is on the right side of the" | |
}, | |
{ | |
"start": 1689.88, | |
"text": "line now your output is going to be" | |
}, | |
{ | |
"start": 1692.84, | |
"text": "greater than" | |
}, | |
{ | |
"start": 1694.279, | |
"text": "0.5 right so here we can actually" | |
}, | |
{ | |
"start": 1696.679, | |
"text": "visualize this space this is called the" | |
}, | |
{ | |
"start": 1698.72, | |
"text": "feature space of a neural network we can" | |
}, | |
{ | |
"start": 1701.2, | |
"text": "visualize it in its completion right we" | |
}, | |
{ | |
"start": 1704.08, | |
"text": "can totally visualize and interpret this" | |
}, | |
{ | |
"start": 1706.08, | |
"text": "neural network we can understand exactly" | |
}, | |
{ | |
"start": 1708.24, | |
"text": "what it's going to do for any input that" | |
}, | |
{ | |
"start": 1710.36, | |
"text": "it sees right but of course this is a" | |
}, | |
{ | |
"start": 1712.88, | |
"text": "very simple neuron right it's not a" | |
}, | |
{ | |
"start": 1714.6, | |
"text": "neural network it's just one neuron and" | |
}, | |
{ | |
"start": 1716.84, | |
"text": "even more than that it's even a very" | |
}, | |
{ | |
"start": 1718.519, | |
"text": "simple neuron it only has two inputs" | |
}, | |
{ | |
"start": 1721.08, | |
"text": "right so in reality the types of neuron" | |
}, | |
{ | |
"start": 1724.24, | |
"text": "neurons that you're going to be dealing" | |
}, | |
{ | |
"start": 1725.64, | |
"text": "with in this course are going to be" | |
}, | |
{ | |
"start": 1727.64, | |
"text": "neurons and neural networks with" | |
}, | |
{ | |
"start": 1730.32, | |
"text": "millions or even billions of these" | |
}, | |
{ | |
"start": 1732.84, | |
"text": "parameters right of these inputs right" | |
}, | |
{ | |
"start": 1735.2, | |
"text": "so here we only have two weights W1 W2" | |
}, | |
{ | |
"start": 1738.24, | |
"text": "but today's neural networks have" | |
}, | |
{ | |
"start": 1739.84, | |
"text": "billions of these parameters so drawing" | |
}, | |
{ | |
"start": 1742.679, | |
"text": "these types of plots that you see here" | |
}, | |
{ | |
"start": 1745.6, | |
"text": "obviously becomes a lot more challenging" | |
}, | |
{ | |
"start": 1747.679, | |
"text": "it's actually not" | |
}, | |
{ | |
"start": 1749.919, | |
"text": "possible but now that we have some of" | |
}, | |
{ | |
"start": 1751.96, | |
"text": "the intuition behind a perceptron let's" | |
}, | |
{ | |
"start": 1754.6, | |
"text": "start now by building neural networks" | |
}, | |
{ | |
"start": 1757.559, | |
"text": "and seeing how all of this comes" | |
}, | |
{ | |
"start": 1759.44, | |
"text": "together so let's revisit that previous" | |
}, | |
{ | |
"start": 1761.679, | |
"text": "diagram of a perceptron now again if" | |
}, | |
{ | |
"start": 1764.6, | |
"text": "there's only one thing to take away from" | |
}, | |
{ | |
"start": 1766.799, | |
"text": "this lecture right now it's to remember" | |
}, | |
{ | |
"start": 1769.799, | |
"text": "how a perceptron works that equation of" | |
}, | |
{ | |
"start": 1772.279, | |
"text": "a perceptron is extremely important for" | |
}, | |
{ | |
"start": 1774.32, | |
"text": "every single class that comes after" | |
}, | |
{ | |
"start": 1775.799, | |
"text": "today and there's only three steps it's" | |
}, | |
{ | |
"start": 1778.32, | |
"text": "dot product with the inputs add a bias" | |
}, | |
{ | |
"start": 1781.6, | |
"text": "and apply your" | |
}, | |
{ | |
"start": 1783.24, | |
"text": "nonlinearity let's simplify the diagram" | |
}, | |
{ | |
"start": 1785.519, | |
"text": "a little bit I'll remove the weight" | |
}, | |
{ | |
"start": 1787.72, | |
"text": "labels from this picture and now you can" | |
}, | |
{ | |
"start": 1790.32, | |
"text": "assume that if I show a line every" | |
}, | |
{ | |
"start": 1792.72, | |
"text": "single line has an Associated weight" | |
}, | |
{ | |
"start": 1795.36, | |
"text": "that comes with that line right I'll" | |
}, | |
{ | |
"start": 1797.88, | |
"text": "also also remove the bias term for" | |
}, | |
{ | |
"start": 1799.559, | |
"text": "Simplicity assume that every neuron has" | |
}, | |
{ | |
"start": 1801.799, | |
"text": "that bias term I don't need to show it" | |
}, | |
{ | |
"start": 1804.159, | |
"text": "and now note that the result here now" | |
}, | |
{ | |
"start": 1807.279, | |
"text": "calling it Z which is just the uh dot" | |
}, | |
{ | |
"start": 1810.44, | |
"text": "product plus bias before the" | |
}, | |
{ | |
"start": 1813.0, | |
"text": "nonlinearity is the output is going to" | |
}, | |
{ | |
"start": 1815.88, | |
"text": "be linear first of all it's just a it's" | |
}, | |
{ | |
"start": 1817.64, | |
"text": "just a weighted sum of all those pieces" | |
}, | |
{ | |
"start": 1819.48, | |
"text": "we have not applied the nonlinearity yet" | |
}, | |
{ | |
"start": 1821.76, | |
"text": "but our final output is just going to be" | |
}, | |
{ | |
"start": 1824.48, | |
"text": "G of Z it's the activation function or" | |
}, | |
{ | |
"start": 1827.159, | |
"text": "nonlinear activ function applied to" | |
}, | |
{ | |
"start": 1830.799, | |
"text": "Z now if we want to step this up a" | |
}, | |
{ | |
"start": 1833.799, | |
"text": "little bit more and say what if we had a" | |
}, | |
{ | |
"start": 1837.72, | |
"text": "multi-output function now we don't just" | |
}, | |
{ | |
"start": 1839.88, | |
"text": "have one output but let's say we want to" | |
}, | |
{ | |
"start": 1841.48, | |
"text": "have two outputs well now we can just" | |
}, | |
{ | |
"start": 1843.48, | |
"text": "have two neurons in this network right" | |
}, | |
{ | |
"start": 1846.84, | |
"text": "every neuron say sees all of the inputs" | |
}, | |
{ | |
"start": 1849.76, | |
"text": "that came before it but now you see the" | |
}, | |
{ | |
"start": 1852.2, | |
"text": "top neuron is going to be predicting an" | |
}, | |
{ | |
"start": 1854.76, | |
"text": "answer and the bottom neuron will" | |
}, | |
{ | |
"start": 1856.12, | |
"text": "predict its own answer now importantly" | |
}, | |
{ | |
"start": 1858.159, | |
"text": "one thing you should really notice here" | |
}, | |
{ | |
"start": 1859.519, | |
"text": "is that each neuron has its own weights" | |
}, | |
{ | |
"start": 1863.519, | |
"text": "right each neuron has its own lines that" | |
}, | |
{ | |
"start": 1865.639, | |
"text": "are coming into just that neuron right" | |
}, | |
{ | |
"start": 1867.96, | |
"text": "so they're acting independently but they" | |
}, | |
{ | |
"start": 1870.08, | |
"text": "can later on communicate if you have" | |
}, | |
{ | |
"start": 1872.039, | |
"text": "another" | |
}, | |
{ | |
"start": 1873.24, | |
"text": "layer" | |
}, | |
{ | |
"start": 1876.24, | |
"text": "right so let's start now by initializing" | |
}, | |
{ | |
"start": 1880.32, | |
"text": "this uh this process a bit further and" | |
}, | |
{ | |
"start": 1883.639, | |
"text": "thinking about it more programmatically" | |
}, | |
{ | |
"start": 1885.679, | |
"text": "right what if we wanted to program this" | |
}, | |
{ | |
"start": 1887.919, | |
"text": "this neural network ourselves from" | |
}, | |
{ | |
"start": 1890.2, | |
"text": "scratch right remember that equation I" | |
}, | |
{ | |
"start": 1891.96, | |
"text": "told you it didn't sound very complex" | |
}, | |
{ | |
"start": 1893.639, | |
"text": "it's take a DOT product add a bias which" | |
}, | |
{ | |
"start": 1896.32, | |
"text": "is a single number and apply" | |
}, | |
{ | |
"start": 1898.08, | |
"text": "nonlinearity let's see how we would" | |
}, | |
{ | |
"start": 1899.6, | |
"text": "actually Implement something like that" | |
}, | |
{ | |
"start": 1901.44, | |
"text": "so to to define the layer right we're" | |
}, | |
{ | |
"start": 1904.12, | |
"text": "now going to call this a layer uh which" | |
}, | |
{ | |
"start": 1906.639, | |
"text": "is a collection of neurons right we have" | |
}, | |
{ | |
"start": 1910.799, | |
"text": "to first Define how that information" | |
}, | |
{ | |
"start": 1913.36, | |
"text": "propagates through the network so we can" | |
}, | |
{ | |
"start": 1915.639, | |
"text": "do that by creating a call function here" | |
}, | |
{ | |
"start": 1918.0, | |
"text": "first we're going to actually Define the" | |
}, | |
{ | |
"start": 1919.76, | |
"text": "weights for that Network right so" | |
}, | |
{ | |
"start": 1922.159, | |
"text": "remember every Network every neuron I" | |
}, | |
{ | |
"start": 1924.519, | |
"text": "should say every neuron has weights and" | |
}, | |
{ | |
"start": 1926.679, | |
"text": "a bias right so let's define those first" | |
}, | |
{ | |
"start": 1929.84, | |
"text": "we're going to create the call function" | |
}, | |
{ | |
"start": 1931.799, | |
"text": "to actually see how we can pass" | |
}, | |
{ | |
"start": 1935.12, | |
"text": "information through that layer right so" | |
}, | |
{ | |
"start": 1938.2, | |
"text": "this is going to take us input and" | |
}, | |
{ | |
"start": 1939.76, | |
"text": "inputs right this is like what we" | |
}, | |
{ | |
"start": 1941.639, | |
"text": "previously called X and it's the same" | |
}, | |
{ | |
"start": 1944.679, | |
"text": "story that we've been seeing this whole" | |
}, | |
{ | |
"start": 1946.44, | |
"text": "class right we're going to Matrix" | |
}, | |
{ | |
"start": 1948.76, | |
"text": "multiply or take a DOT product of our" | |
}, | |
{ | |
"start": 1950.679, | |
"text": "inputs with our" | |
}, | |
{ | |
"start": 1952.159, | |
"text": "weights we're going to add a bias and" | |
}, | |
{ | |
"start": 1955.279, | |
"text": "then we're going to apply a nonlinearity" | |
}, | |
{ | |
"start": 1957.639, | |
"text": "it's really that simple right we've now" | |
}, | |
{ | |
"start": 1959.919, | |
"text": "created a single layer neural" | |
}, | |
{ | |
"start": 1963.639, | |
"text": "network right so this this line in" | |
}, | |
{ | |
"start": 1966.559, | |
"text": "particular this is the part that allows" | |
}, | |
{ | |
"start": 1968.279, | |
"text": "us to" | |
}, | |
{ | |
"start": 1969.519, | |
"text": "be a powerful neural network maintaining" | |
}, | |
{ | |
"start": 1972.559, | |
"text": "that" | |
}, | |
{ | |
"start": 1973.559, | |
"text": "nonlinearity and the important thing" | |
}, | |
{ | |
"start": 1976.12, | |
"text": "here is to note that" | |
}, | |
{ | |
"start": 1979.0, | |
"text": "modern deep learning toolboxes and" | |
}, | |
{ | |
"start": 1981.24, | |
"text": "libraries already Implement a lot of" | |
}, | |
{ | |
"start": 1983.36, | |
"text": "these for you right so it's important" | |
}, | |
{ | |
"start": 1985.2, | |
"text": "for you to understand the foundations" | |
}, | |
{ | |
"start": 1987.32, | |
"text": "but in practice all of that layer" | |
}, | |
{ | |
"start": 1990.0, | |
"text": "architecture and all that layer logic is" | |
}, | |
{ | |
"start": 1992.639, | |
"text": "actually implemented in tools like" | |
}, | |
{ | |
"start": 1994.799, | |
"text": "tensorflow and P torch through a dense" | |
}, | |
{ | |
"start": 1997.32, | |
"text": "layer right so here you can see an" | |
}, | |
{ | |
"start": 1998.799, | |
"text": "example of calling or creating" | |
}, | |
{ | |
"start": 2002.0, | |
"text": "initializing a dense layer with two" | |
}, | |
{ | |
"start": 2005.84, | |
"text": "neurons right allowing it to feed in an" | |
}, | |
{ | |
"start": 2008.96, | |
"text": "arbitrary set of inputs here we're" | |
}, | |
{ | |
"start": 2010.639, | |
"text": "seeing these two neurons in a layer" | |
}, | |
{ | |
"start": 2013.12, | |
"text": "being fed three inputs right and in code" | |
}, | |
{ | |
"start": 2016.32, | |
"text": "it's only reduced down to this one line" | |
}, | |
{ | |
"start": 2018.72, | |
"text": "of tensorflow code making it extremely" | |
}, | |
{ | |
"start": 2020.679, | |
"text": "easy and convenient for us to use these" | |
}, | |
{ | |
"start": 2023.559, | |
"text": "functions and call them so now let's" | |
}, | |
{ | |
"start": 2026.159, | |
"text": "look at our single layered neural" | |
}, | |
{ | |
"start": 2028.08, | |
"text": "network this is where we have now one" | |
}, | |
{ | |
"start": 2030.519, | |
"text": "layer between our input and our outputs" | |
}, | |
{ | |
"start": 2033.639, | |
"text": "right so we're slowly and progressively" | |
}, | |
{ | |
"start": 2036.039, | |
"text": "increasing the complexity of our neural" | |
}, | |
{ | |
"start": 2038.2, | |
"text": "network so that we can build up all of" | |
}, | |
{ | |
"start": 2039.84, | |
"text": "these building blocks right this layer" | |
}, | |
{ | |
"start": 2043.48, | |
"text": "in the middle is called a hidden layer" | |
}, | |
{ | |
"start": 2046.44, | |
"text": "right obviously because you don't" | |
}, | |
{ | |
"start": 2047.679, | |
"text": "directly observe it you don't directly" | |
}, | |
{ | |
"start": 2049.24, | |
"text": "supervise it right you do observe the" | |
}, | |
{ | |
"start": 2051.839, | |
"text": "two input and output layers but your" | |
}, | |
{ | |
"start": 2053.599, | |
"text": "hidden layer is just kind of a uh a" | |
}, | |
{ | |
"start": 2056.159, | |
"text": "neuron neuron layer that you don't" | |
}, | |
{ | |
"start": 2058.599, | |
"text": "directly observe right it just gives" | |
}, | |
{ | |
"start": 2060.28, | |
"text": "your network more capacity more learning" | |
}, | |
{ | |
"start": 2063.72, | |
"text": "complexity and since we now have a" | |
}, | |
{ | |
"start": 2065.599, | |
"text": "transformation function from inputs to" | |
}, | |
{ | |
"start": 2068.0, | |
"text": "Hidden layers and hidden layers to" | |
}, | |
{ | |
"start": 2070.159, | |
"text": "Output we now have a two- layered neural" | |
}, | |
{ | |
"start": 2073.24, | |
"text": "network right which means that we also" | |
}, | |
{ | |
"start": 2076.2, | |
"text": "have two weight matrices right we don't" | |
}, | |
{ | |
"start": 2078.839, | |
"text": "have just the W1 which we previously had" | |
}, | |
{ | |
"start": 2081.72, | |
"text": "to create this hidden layer but now we" | |
}, | |
{ | |
"start": 2083.28, | |
"text": "also have W2 which does the" | |
}, | |
{ | |
"start": 2085.04, | |
"text": "transformation from hidden layer to" | |
}, | |
{ | |
"start": 2086.44, | |
"text": "Output layer yes what happens" | |
}, | |
{ | |
"start": 2088.96, | |
"text": "nonlinearity in Hidden you have just" | |
}, | |
{ | |
"start": 2091.04, | |
"text": "linear so there's no it's not is it a" | |
}, | |
{ | |
"start": 2093.52, | |
"text": "perceptron or not yes so every hidden" | |
}, | |
{ | |
"start": 2096.32, | |
"text": "layer also has an nonlinearity" | |
}, | |
{ | |
"start": 2098.64, | |
"text": "accompanied with it right and that's a" | |
}, | |
{ | |
"start": 2100.4, | |
"text": "very important point because if you" | |
}, | |
{ | |
"start": 2101.72, | |
"text": "don't have that perceptron then it's" | |
}, | |
{ | |
"start": 2103.56, | |
"text": "just a very large linear function" | |
}, | |
{ | |
"start": 2105.68, | |
"text": "followed by a final nonlinearity at the" | |
}, | |
{ | |
"start": 2107.64, | |
"text": "very end right so you need that" | |
}, | |
{ | |
"start": 2109.8, | |
"text": "cascading and uh you know overlapping" | |
}, | |
{ | |
"start": 2113.24, | |
"text": "application of nonlinearities that occur" | |
}, | |
{ | |
"start": 2115.839, | |
"text": "throughout the" | |
}, | |
{ | |
"start": 2117.599, | |
"text": "network" | |
}, | |
{ | |
"start": 2119.56, | |
"text": "awesome okay so now let's zoom in look" | |
}, | |
{ | |
"start": 2122.88, | |
"text": "at a single unit in the hidden layer" | |
}, | |
{ | |
"start": 2125.28, | |
"text": "take this one for example let's call it" | |
}, | |
{ | |
"start": 2127.079, | |
"text": "Z2 right it's the second neuron in the" | |
}, | |
{ | |
"start": 2129.4, | |
"text": "first layer right it's the same" | |
}, | |
{ | |
"start": 2131.72, | |
"text": "perception that we saw before we compute" | |
}, | |
{ | |
"start": 2134.2, | |
"text": "its answer by taking a DOT product of" | |
}, | |
{ | |
"start": 2136.599, | |
"text": "its weights with its inputs adding a" | |
}, | |
{ | |
"start": 2139.56, | |
"text": "bias and then applying a nonlinearity if" | |
}, | |
{ | |
"start": 2142.32, | |
"text": "we took a different hidden nodee like Z3" | |
}, | |
{ | |
"start": 2145.2, | |
"text": "the one right below it we would compute" | |
}, | |
{ | |
"start": 2147.48, | |
"text": "its answer exactly the same way that we" | |
}, | |
{ | |
"start": 2149.119, | |
"text": "computed Z2 except its weights would be" | |
}, | |
{ | |
"start": 2151.76, | |
"text": "different than the weights of Z2" | |
}, | |
{ | |
"start": 2153.24, | |
"text": "everything else stays exactly the same" | |
}, | |
{ | |
"start": 2154.839, | |
"text": "it sees the same inputs but of course" | |
}, | |
{ | |
"start": 2157.2, | |
"text": "you know I'm not going to actually show" | |
}, | |
{ | |
"start": 2158.599, | |
"text": "Z3 in this picture and now this picture" | |
}, | |
{ | |
"start": 2161.2, | |
"text": "is getting a little bit messy so let's" | |
}, | |
{ | |
"start": 2162.72, | |
"text": "clean things up a little bit more I'm" | |
}, | |
{ | |
"start": 2164.119, | |
"text": "going to remove all the lines now and" | |
}, | |
{ | |
"start": 2165.92, | |
"text": "replace them just with these these boxes" | |
}, | |
{ | |
"start": 2168.48, | |
"text": "these symbols that will denote what we" | |
}, | |
{ | |
"start": 2171.079, | |
"text": "call a fully connected layer right so" | |
}, | |
{ | |
"start": 2173.16, | |
"text": "these layers now denote that everything" | |
}, | |
{ | |
"start": 2175.359, | |
"text": "in our input is connected to everything" | |
}, | |
{ | |
"start": 2176.92, | |
"text": "in our output and the transformation is" | |
}, | |
{ | |
"start": 2179.0, | |
"text": "exactly as we saw before dot product" | |
}, | |
{ | |
"start": 2181.28, | |
"text": "bias and" | |
}, | |
{ | |
"start": 2184.599, | |
"text": "nonlinearity and again in code to do" | |
}, | |
{ | |
"start": 2187.24, | |
"text": "this is extremely straightforward with" | |
}, | |
{ | |
"start": 2189.0, | |
"text": "the foundations that we've built up from" | |
}, | |
{ | |
"start": 2190.76, | |
"text": "the beginning of the class we can now" | |
}, | |
{ | |
"start": 2192.8, | |
"text": "just Define two of these dense layers" | |
}, | |
{ | |
"start": 2195.4, | |
"text": "right our hidden layer on line one with" | |
}, | |
{ | |
"start": 2197.68, | |
"text": "n hidden units and then our output layer" | |
}, | |
{ | |
"start": 2200.839, | |
"text": "with two hidden output units does that" | |
}, | |
{ | |
"start": 2203.359, | |
"text": "mean the nonlinearity function must be" | |
}, | |
{ | |
"start": 2205.079, | |
"text": "the same between layers nonlinearity" | |
}, | |
{ | |
"start": 2207.599, | |
"text": "function does not need to be the same" | |
}, | |
{ | |
"start": 2208.96, | |
"text": "through through each layer often times" | |
}, | |
{ | |
"start": 2211.24, | |
"text": "it is because of convenience there's" | |
}, | |
{ | |
"start": 2214.64, | |
"text": "there are some cases where you would" | |
}, | |
{ | |
"start": 2216.079, | |
"text": "want it to be different as well" | |
}, | |
{ | |
"start": 2218.0, | |
"text": "especially in lecture two you're going" | |
}, | |
{ | |
"start": 2220.079, | |
"text": "to see nonlinearities be different even" | |
}, | |
{ | |
"start": 2222.359, | |
"text": "within the same layer um let alone" | |
}, | |
{ | |
"start": 2225.2, | |
"text": "different layers but uh unless for a" | |
}, | |
{ | |
"start": 2229.2, | |
"text": "particular reason generally convention" | |
}, | |
{ | |
"start": 2230.92, | |
"text": "is there's no need to keep them" | |
}, | |
{ | |
"start": 2234.04, | |
"text": "differently now let's keep expanding our" | |
}, | |
{ | |
"start": 2237.2, | |
"text": "knowledge a little bit more if we now" | |
}, | |
{ | |
"start": 2238.599, | |
"text": "want to make a deep neural network not" | |
}, | |
{ | |
"start": 2240.48, | |
"text": "just a neural network like we saw in the" | |
}, | |
{ | |
"start": 2242.64, | |
"text": "previous side now it's deep all that" | |
}, | |
{ | |
"start": 2244.28, | |
"text": "means is that we're now going to stack" | |
}, | |
{ | |
"start": 2246.359, | |
"text": "these layers on top of each other one by" | |
}, | |
{ | |
"start": 2248.319, | |
"text": "one more and more creating a" | |
}, | |
{ | |
"start": 2250.56, | |
"text": "hierarchical model right the ones where" | |
}, | |
{ | |
"start": 2253.2, | |
"text": "the final output is now going to be" | |
}, | |
{ | |
"start": 2255.52, | |
"text": "computed by going deeper and deeper and" | |
}, | |
{ | |
"start": 2257.52, | |
"text": "deeper into the neural network and again" | |
}, | |
{ | |
"start": 2261.28, | |
"text": "doing this in code again follows the" | |
}, | |
{ | |
"start": 2263.56, | |
"text": "exact same story as before just" | |
}, | |
{ | |
"start": 2265.24, | |
"text": "cascading these tensorflow layers on top" | |
}, | |
{ | |
"start": 2268.359, | |
"text": "of each other and just going deeper into" | |
}, | |
{ | |
"start": 2270.68, | |
"text": "the" | |
}, | |
{ | |
"start": 2272.4, | |
"text": "network okay so now this is great" | |
}, | |
{ | |
"start": 2275.0, | |
"text": "because now we have at least a solid" | |
}, | |
{ | |
"start": 2276.96, | |
"text": "foundational understanding of how to not" | |
}, | |
{ | |
"start": 2279.28, | |
"text": "only Define a single neuron but how to" | |
}, | |
{ | |
"start": 2281.319, | |
"text": "define an entire neural network and you" | |
}, | |
{ | |
"start": 2283.0, | |
"text": "should be able to actually explain at" | |
}, | |
{ | |
"start": 2284.76, | |
"text": "this point or understand how information" | |
}, | |
{ | |
"start": 2287.4, | |
"text": "goes from input through an entire neural" | |
}, | |
{ | |
"start": 2290.68, | |
"text": "network to compute an output so now" | |
}, | |
{ | |
"start": 2293.68, | |
"text": "let's look at how we can apply these" | |
}, | |
{ | |
"start": 2295.44, | |
"text": "neural networks to solve a very real" | |
}, | |
{ | |
"start": 2298.2, | |
"text": "problem that uh I'm sure all of you care" | |
}, | |
{ | |
"start": 2300.52, | |
"text": "about so here's a problem on how we want" | |
}, | |
{ | |
"start": 2302.839, | |
"text": "to build an AI system to learn to answer" | |
}, | |
{ | |
"start": 2305.24, | |
"text": "the following question which is will I" | |
}, | |
{ | |
"start": 2307.92, | |
"text": "pass this class right I'm sure all of" | |
}, | |
{ | |
"start": 2310.079, | |
"text": "you are really worried about this" | |
}, | |
{ | |
"start": 2312.52, | |
"text": "question um so to do this let's start" | |
}, | |
{ | |
"start": 2315.359, | |
"text": "with a simple input feature model the" | |
}, | |
{ | |
"start": 2318.28, | |
"text": "feature the two features that let's" | |
}, | |
{ | |
"start": 2320.48, | |
"text": "concern ourselves with are going to be" | |
}, | |
{ | |
"start": 2322.24, | |
"text": "number one how many lectures you attend" | |
}, | |
{ | |
"start": 2325.56, | |
"text": "and number two how many hours you spend" | |
}, | |
{ | |
"start": 2329.2, | |
"text": "on your final" | |
}, | |
{ | |
"start": 2330.599, | |
"text": "project so let's look at some of the" | |
}, | |
{ | |
"start": 2333.599, | |
"text": "past years of this class right we can" | |
}, | |
{ | |
"start": 2335.64, | |
"text": "actually observe how different people" | |
}, | |
{ | |
"start": 2338.48, | |
"text": "have uh lived in this space right" | |
}, | |
{ | |
"start": 2341.64, | |
"text": "between how many lectures and how much" | |
}, | |
{ | |
"start": 2343.44, | |
"text": "time You' spent on your final project" | |
}, | |
{ | |
"start": 2345.319, | |
"text": "and you can actually see every point is" | |
}, | |
{ | |
"start": 2347.2, | |
"text": "a person the color of that point is" | |
}, | |
{ | |
"start": 2349.599, | |
"text": "going to be if they passed or failed the" | |
}, | |
{ | |
"start": 2351.2, | |
"text": "class and you can see and visualize kind" | |
}, | |
{ | |
"start": 2353.76, | |
"text": "of this V this feature space if you will" | |
}, | |
{ | |
"start": 2356.64, | |
"text": "that we talked about before and then we" | |
}, | |
{ | |
"start": 2358.4, | |
"text": "have you you fall right here you're the" | |
}, | |
{ | |
"start": 2360.839, | |
"text": "point" | |
}, | |
{ | |
"start": 2361.88, | |
"text": "45 uh right in between the the this uh" | |
}, | |
{ | |
"start": 2365.92, | |
"text": "feature space you've attended four" | |
}, | |
{ | |
"start": 2368.119, | |
"text": "lectures and you will spend 5 hours on" | |
}, | |
{ | |
"start": 2370.04, | |
"text": "the final project and you want to build" | |
}, | |
{ | |
"start": 2372.0, | |
"text": "a neural network to determine given" | |
}, | |
{ | |
"start": 2374.68, | |
"text": "everyone else in the class right that" | |
}, | |
{ | |
"start": 2376.88, | |
"text": "I've seen from all of the previous years" | |
}, | |
{ | |
"start": 2379.2, | |
"text": "you want to help you want to have your" | |
}, | |
{ | |
"start": 2381.04, | |
"text": "neural network help you to understand" | |
}, | |
{ | |
"start": 2383.599, | |
"text": "what is your likelihood that you will" | |
}, | |
{ | |
"start": 2386.24, | |
"text": "pass or fail this class so let's do it" | |
}, | |
{ | |
"start": 2389.119, | |
"text": "we now have all of the building blocks" | |
}, | |
{ | |
"start": 2390.68, | |
"text": "to solve this problem using a neural" | |
}, | |
{ | |
"start": 2392.28, | |
"text": "network let's do it so we have two" | |
}, | |
{ | |
"start": 2394.319, | |
"text": "inputs those inputs are the number of" | |
}, | |
{ | |
"start": 2396.4, | |
"text": "lectures you attend and number of hours" | |
}, | |
{ | |
"start": 2398.44, | |
"text": "you spend on your final project it's" | |
}, | |
{ | |
"start": 2400.599, | |
"text": "four and five we can pass those two" | |
}, | |
{ | |
"start": 2402.16, | |
"text": "inputs to our two uh X1 and X2 variables" | |
}, | |
{ | |
"start": 2407.04, | |
"text": "these are fed into this single layered" | |
}, | |
{ | |
"start": 2410.04, | |
"text": "single hidden layered neural network it" | |
}, | |
{ | |
"start": 2412.96, | |
"text": "has three hidden units in the middle and" | |
}, | |
{ | |
"start": 2415.319, | |
"text": "we can see that the final predicted" | |
}, | |
{ | |
"start": 2417.04, | |
"text": "output probability for you to pass this" | |
}, | |
{ | |
"start": 2419.2, | |
"text": "class is 0.1 or 10% right so very Bleak" | |
}, | |
{ | |
"start": 2423.2, | |
"text": "outcome it's not a good outcome um the" | |
}, | |
{ | |
"start": 2427.04, | |
"text": "actual ual probability is one right so" | |
}, | |
{ | |
"start": 2430.8, | |
"text": "attending four out of the five lectures" | |
}, | |
{ | |
"start": 2432.359, | |
"text": "and spending 5 hours in your final" | |
}, | |
{ | |
"start": 2433.92, | |
"text": "project you actually lived in a part of" | |
}, | |
{ | |
"start": 2435.52, | |
"text": "the feature space which was actually" | |
}, | |
{ | |
"start": 2436.92, | |
"text": "very positive right it looked like you" | |
}, | |
{ | |
"start": 2438.24, | |
"text": "were going to pass the class so what" | |
}, | |
{ | |
"start": 2439.8, | |
"text": "happened here anyone have any ideas so" | |
}, | |
{ | |
"start": 2441.92, | |
"text": "why did the neural network get this so" | |
}, | |
{ | |
"start": 2443.68, | |
"text": "terribly wrong right it's not trained" | |
}, | |
{ | |
"start": 2446.92, | |
"text": "exactly so this neural network is not" | |
}, | |
{ | |
"start": 2448.44, | |
"text": "trained we haven't shown any of that" | |
}, | |
{ | |
"start": 2450.76, | |
"text": "data the green and red data right so you" | |
}, | |
{ | |
"start": 2453.72, | |
"text": "should really think of neural networks" | |
}, | |
{ | |
"start": 2455.76, | |
"text": "like babies right before they see data" | |
}, | |
{ | |
"start": 2458.72, | |
"text": "they haven't learned anything there's no" | |
}, | |
{ | |
"start": 2460.96, | |
"text": "expectation that we should have for them" | |
}, | |
{ | |
"start": 2462.92, | |
"text": "to be able to solve any of these types" | |
}, | |
{ | |
"start": 2464.359, | |
"text": "of problems before we teach them" | |
}, | |
{ | |
"start": 2465.96, | |
"text": "something about the world so let's teach" | |
}, | |
{ | |
"start": 2468.24, | |
"text": "this neural network something about uh" | |
}, | |
{ | |
"start": 2470.44, | |
"text": "the problem first right and to train it" | |
}, | |
{ | |
"start": 2472.599, | |
"text": "we first need to tell our neural network" | |
}, | |
{ | |
"start": 2475.92, | |
"text": "when it's making bad decisions right so" | |
}, | |
{ | |
"start": 2478.359, | |
"text": "we need to teach it right really train" | |
}, | |
{ | |
"start": 2480.56, | |
"text": "it to learn exactly like how we as" | |
}, | |
{ | |
"start": 2482.92, | |
"text": "humans learn in some ways right so we" | |
}, | |
{ | |
"start": 2484.96, | |
"text": "have to inform the neural network when" | |
}, | |
{ | |
"start": 2486.96, | |
"text": "it gets the answer incorrect so that it" | |
}, | |
{ | |
"start": 2489.16, | |
"text": "can learn how to get the answer correct" | |
}, | |
{ | |
"start": 2492.28, | |
"text": "right so the closer the answer is to the" | |
}, | |
{ | |
"start": 2495.359, | |
"text": "ground truth so right so for example the" | |
}, | |
{ | |
"start": 2497.76, | |
"text": "actual value for you passing this class" | |
}, | |
{ | |
"start": 2500.04, | |
"text": "was probability one 100% but it" | |
}, | |
{ | |
"start": 2502.88, | |
"text": "predicted a probability of" | |
}, | |
{ | |
"start": 2504.76, | |
"text": "0.1 we compute what's called a loss" | |
}, | |
{ | |
"start": 2507.76, | |
"text": "right so the closer these two things are" | |
}, | |
{ | |
"start": 2509.72, | |
"text": "together the smaller your loss should be" | |
}, | |
{ | |
"start": 2512.319, | |
"text": "and the and the more accurate your model" | |
}, | |
{ | |
"start": 2514.359, | |
"text": "should" | |
}, | |
{ | |
"start": 2515.76, | |
"text": "be so let's assume that we have data not" | |
}, | |
{ | |
"start": 2518.76, | |
"text": "just from one student but now we have" | |
}, | |
{ | |
"start": 2521.119, | |
"text": "data from many students we many students" | |
}, | |
{ | |
"start": 2523.28, | |
"text": "have taken this class before and we can" | |
}, | |
{ | |
"start": 2524.64, | |
"text": "plug all of them into the neural network" | |
}, | |
{ | |
"start": 2526.119, | |
"text": "and show them all to this to this system" | |
}, | |
{ | |
"start": 2528.72, | |
"text": "now we care not only about how the" | |
}, | |
{ | |
"start": 2530.76, | |
"text": "neural network did on just this one" | |
}, | |
{ | |
"start": 2532.68, | |
"text": "prediction but we care about how it" | |
}, | |
{ | |
"start": 2534.76, | |
"text": "predicted on all of these different" | |
}, | |
{ | |
"start": 2536.72, | |
"text": "people that the neural network has shown" | |
}, | |
{ | |
"start": 2538.839, | |
"text": "in the past as well during this training" | |
}, | |
{ | |
"start": 2541.2, | |
"text": "and learning process so when training" | |
}, | |
{ | |
"start": 2543.559, | |
"text": "the neural network we want to find a" | |
}, | |
{ | |
"start": 2545.119, | |
"text": "network that minimizes the empirical" | |
}, | |
{ | |
"start": 2549.04, | |
"text": "loss between our predictions and those" | |
}, | |
{ | |
"start": 2552.16, | |
"text": "ground truth outputs and we're going to" | |
}, | |
{ | |
"start": 2553.68, | |
"text": "do this on average across all of the" | |
}, | |
{ | |
"start": 2556.359, | |
"text": "different inputs that the that the model" | |
}, | |
{ | |
"start": 2559.48, | |
"text": "has" | |
}, | |
{ | |
"start": 2560.48, | |
"text": "seen if we look at this problem of" | |
}, | |
{ | |
"start": 2562.88, | |
"text": "binary" | |
}, | |
{ | |
"start": 2563.92, | |
"text": "classification right between yeses and" | |
}, | |
{ | |
"start": 2566.68, | |
"text": "NOS right will I pass the class or will" | |
}, | |
{ | |
"start": 2568.96, | |
"text": "I not pass the class it's a zero or one" | |
}, | |
{ | |
"start": 2572.16, | |
"text": "probability and we can use what is" | |
}, | |
{ | |
"start": 2574.079, | |
"text": "called the softmax function or the" | |
}, | |
{ | |
"start": 2575.96, | |
"text": "softmax cross entry function to be able" | |
}, | |
{ | |
"start": 2578.68, | |
"text": "to inform if this network is getting the" | |
}, | |
{ | |
"start": 2581.76, | |
"text": "answer correct or incorrect right the" | |
}, | |
{ | |
"start": 2584.079, | |
"text": "softmax cross or the cross entropy" | |
}, | |
{ | |
"start": 2585.96, | |
"text": "function think of this as a as an" | |
}, | |
{ | |
"start": 2587.76, | |
"text": "objective function it's a loss function" | |
}, | |
{ | |
"start": 2590.0, | |
"text": "that tells our neural network how far" | |
}, | |
{ | |
"start": 2592.64, | |
"text": "away these two probability distributions" | |
}, | |
{ | |
"start": 2594.68, | |
"text": "are right so the output is a probability" | |
}, | |
{ | |
"start": 2597.2, | |
"text": "distribution we're trying to determine" | |
}, | |
{ | |
"start": 2599.079, | |
"text": "how bad of an answer the neural network" | |
}, | |
{ | |
"start": 2601.96, | |
"text": "is predicting so that we can give it" | |
}, | |
{ | |
"start": 2603.48, | |
"text": "feedback to get a better" | |
}, | |
{ | |
"start": 2605.319, | |
"text": "answer now let's suppose in instead of" | |
}, | |
{ | |
"start": 2607.52, | |
"text": "training a or predicting a binary output" | |
}, | |
{ | |
"start": 2610.559, | |
"text": "we want to predict a real valued output" | |
}, | |
{ | |
"start": 2613.48, | |
"text": "like a like any number it can take any" | |
}, | |
{ | |
"start": 2615.28, | |
"text": "number plus or minus infinity so for" | |
}, | |
{ | |
"start": 2617.76, | |
"text": "example if you wanted to predict the uh" | |
}, | |
{ | |
"start": 2620.24, | |
"text": "grade that you get in a class right" | |
}, | |
{ | |
"start": 2623.28, | |
"text": "doesn't necessarily need to be between Z" | |
}, | |
{ | |
"start": 2625.16, | |
"text": "and one or Z and 100 even right you" | |
}, | |
{ | |
"start": 2627.92, | |
"text": "could now use a different loss in order" | |
}, | |
{ | |
"start": 2629.839, | |
"text": "to produce that value because our" | |
}, | |
{ | |
"start": 2631.76, | |
"text": "outputs are no longer a probability" | |
}, | |
{ | |
"start": 2633.96, | |
"text": "distribution right so for example what" | |
}, | |
{ | |
"start": 2636.16, | |
"text": "you might do here is compute a mean" | |
}, | |
{ | |
"start": 2638.119, | |
"text": "squared error probabil or mean squared" | |
}, | |
{ | |
"start": 2640.119, | |
"text": "error loss function between your true" | |
}, | |
{ | |
"start": 2641.839, | |
"text": "value or your true grade of the class" | |
}, | |
{ | |
"start": 2644.88, | |
"text": "and the predicted grade right these are" | |
}, | |
{ | |
"start": 2646.8, | |
"text": "two numbers they're not probabilities" | |
}, | |
{ | |
"start": 2648.88, | |
"text": "necessarily you compute their difference" | |
}, | |
{ | |
"start": 2651.24, | |
"text": "you square it to to look at a distance" | |
}, | |
{ | |
"start": 2653.52, | |
"text": "between the two an absolute distance" | |
}, | |
{ | |
"start": 2656.28, | |
"text": "right sign doesn't matter and then you" | |
}, | |
{ | |
"start": 2658.52, | |
"text": "can minimize this thing" | |
}, | |
{ | |
"start": 2661.0, | |
"text": "right okay great so let's put all of" | |
}, | |
{ | |
"start": 2663.72, | |
"text": "this loss information with this problem" | |
}, | |
{ | |
"start": 2665.8, | |
"text": "of finding our Network" | |
}, | |
{ | |
"start": 2667.839, | |
"text": "into a unified problem and a unified" | |
}, | |
{ | |
"start": 2670.44, | |
"text": "solution to actually train our neural" | |
}, | |
{ | |
"start": 2674.079, | |
"text": "network so we knowe that we want to find" | |
}, | |
{ | |
"start": 2677.559, | |
"text": "a neural network that will solve this" | |
}, | |
{ | |
"start": 2679.559, | |
"text": "problem on all this data on average" | |
}, | |
{ | |
"start": 2681.92, | |
"text": "right that's how we contextualize this" | |
}, | |
{ | |
"start": 2684.0, | |
"text": "problem earlier in the in the lectures" | |
}, | |
{ | |
"start": 2686.24, | |
"text": "this means effectively that we're trying" | |
}, | |
{ | |
"start": 2687.76, | |
"text": "to solve or we're trying to find what" | |
}, | |
{ | |
"start": 2690.839, | |
"text": "are the weights for our neural network" | |
}, | |
{ | |
"start": 2693.079, | |
"text": "what are this ve this big Vector W that" | |
}, | |
{ | |
"start": 2695.8, | |
"text": "we talked about in earlier in the" | |
}, | |
{ | |
"start": 2697.24, | |
"text": "lecture what is this Vector W compute" | |
}, | |
{ | |
"start": 2699.92, | |
"text": "this Vector W for me based on all of the" | |
}, | |
{ | |
"start": 2702.599, | |
"text": "data that we have seen right now the" | |
}, | |
{ | |
"start": 2705.559, | |
"text": "vector W is also going to determine what" | |
}, | |
{ | |
"start": 2709.64, | |
"text": "is the loss right so given a single" | |
}, | |
{ | |
"start": 2711.92, | |
"text": "Vector w we can compute how bad is this" | |
}, | |
{ | |
"start": 2715.2, | |
"text": "neural network performing on our data" | |
}, | |
{ | |
"start": 2718.0, | |
"text": "right so what is the loss what is this" | |
}, | |
{ | |
"start": 2720.119, | |
"text": "deviation from the ground truth of our" | |
}, | |
{ | |
"start": 2722.64, | |
"text": "network uh based on where it should" | |
}, | |
{ | |
"start": 2725.28, | |
"text": "be now remember that that W is just a" | |
}, | |
{ | |
"start": 2729.559, | |
"text": "group of a bunch of numbers right it's a" | |
}, | |
{ | |
"start": 2732.559, | |
"text": "very big list of numbers a list of" | |
}, | |
{ | |
"start": 2735.48, | |
"text": "Weights uh for every single layer and" | |
}, | |
{ | |
"start": 2738.52, | |
"text": "every single neuron in our neural" | |
}, | |
{ | |
"start": 2740.88, | |
"text": "network right so it's just a very big" | |
}, | |
{ | |
"start": 2743.359, | |
"text": "list or a vector of of Weights we want" | |
}, | |
{ | |
"start": 2745.839, | |
"text": "to find that Vector what is that Vector" | |
}, | |
{ | |
"start": 2748.04, | |
"text": "based on a lot of data that's the" | |
}, | |
{ | |
"start": 2749.599, | |
"text": "problem of training a neural network and" | |
}, | |
{ | |
"start": 2751.88, | |
"text": "remember our loss function is just a" | |
}, | |
{ | |
"start": 2754.24, | |
"text": "simple function of our weights if we" | |
}, | |
{ | |
"start": 2757.28, | |
"text": "have only two weights in our neural" | |
}, | |
{ | |
"start": 2758.92, | |
"text": "network like we saw earlier in the slide" | |
}, | |
{ | |
"start": 2761.04, | |
"text": "then we can plot the Lost landscape over" | |
}, | |
{ | |
"start": 2763.839, | |
"text": "this two-dimensional space right so we" | |
}, | |
{ | |
"start": 2765.72, | |
"text": "have two weights W1 and W2 and for every" | |
}, | |
{ | |
"start": 2768.8, | |
"text": "single configuration or setting of those" | |
}, | |
{ | |
"start": 2772.04, | |
"text": "two weights our loss will have a" | |
}, | |
{ | |
"start": 2774.599, | |
"text": "particular value which here we're" | |
}, | |
{ | |
"start": 2775.88, | |
"text": "showing is the height of this graph" | |
}, | |
{ | |
"start": 2778.16, | |
"text": "right so for any W1 and W2 what is the" | |
}, | |
{ | |
"start": 2781.52, | |
"text": "loss and what we want to do is find the" | |
}, | |
{ | |
"start": 2784.52, | |
"text": "lowest point what is the best loss where" | |
}, | |
{ | |
"start": 2787.48, | |
"text": "what are the weights such that our loss" | |
}, | |
{ | |
"start": 2790.359, | |
"text": "will be as good as possible so the" | |
}, | |
{ | |
"start": 2793.04, | |
"text": "smaller the loss the better so we want" | |
}, | |
{ | |
"start": 2794.48, | |
"text": "to find the lowest point in this" | |
}, | |
{ | |
"start": 2797.599, | |
"text": "graph now how do we do that right so the" | |
}, | |
{ | |
"start": 2800.76, | |
"text": "way this works is we start somewhere in" | |
}, | |
{ | |
"start": 2803.88, | |
"text": "this space we don't know where to start" | |
}, | |
{ | |
"start": 2805.24, | |
"text": "so let's pick a random place to start" | |
}, | |
{ | |
"start": 2808.079, | |
"text": "right now from that place let's compute" | |
}, | |
{ | |
"start": 2812.559, | |
"text": "What's called the gradient of the" | |
}, | |
{ | |
"start": 2814.359, | |
"text": "landscape at that particular point this" | |
}, | |
{ | |
"start": 2816.48, | |
"text": "is a very local estimate of where is" | |
}, | |
{ | |
"start": 2819.88, | |
"text": "going up basically where where is the" | |
}, | |
{ | |
"start": 2822.079, | |
"text": "slope increasing at my current location" | |
}, | |
{ | |
"start": 2825.28, | |
"text": "right that informs us not only where the" | |
}, | |
{ | |
"start": 2827.2, | |
"text": "slope is increasing but more importantly" | |
}, | |
{ | |
"start": 2829.72, | |
"text": "where the slope is decreasing if I" | |
}, | |
{ | |
"start": 2831.28, | |
"text": "negate the direction if I go in the" | |
}, | |
{ | |
"start": 2832.68, | |
"text": "opposite direction I can actually step" | |
}, | |
{ | |
"start": 2835.04, | |
"text": "down into the landscape and change my" | |
}, | |
{ | |
"start": 2837.839, | |
"text": "weights such that I lower my" | |
}, | |
{ | |
"start": 2840.559, | |
"text": "loss so let's take a small step just a" | |
}, | |
{ | |
"start": 2843.359, | |
"text": "small step in the opposite direction of" | |
}, | |
{ | |
"start": 2845.319, | |
"text": "the part that's going up let's take a" | |
}, | |
{ | |
"start": 2847.559, | |
"text": "small step going down and we'll keep" | |
}, | |
{ | |
"start": 2849.88, | |
"text": "repeating this process we'll compute a" | |
}, | |
{ | |
"start": 2851.559, | |
"text": "new gradient at that new point and then" | |
}, | |
{ | |
"start": 2853.88, | |
"text": "we'll take another small step and we'll" | |
}, | |
{ | |
"start": 2855.28, | |
"text": "keep doing this over and over and over" | |
}, | |
{ | |
"start": 2856.96, | |
"text": "again until we converge at what's called" | |
}, | |
{ | |
"start": 2859.04, | |
"text": "a local minimum right so based on where" | |
}, | |
{ | |
"start": 2861.76, | |
"text": "we started it may not be a global" | |
}, | |
{ | |
"start": 2864.04, | |
"text": "minimum of everywhere in this lost" | |
}, | |
{ | |
"start": 2865.8, | |
"text": "landscape but let's find ourselves now" | |
}, | |
{ | |
"start": 2867.72, | |
"text": "in a local minimum and we're guaranteed" | |
}, | |
{ | |
"start": 2869.599, | |
"text": "to actually converge by following this" | |
}, | |
{ | |
"start": 2871.28, | |
"text": "very simple algorithm at a local" | |
}, | |
{ | |
"start": 2874.359, | |
"text": "minimum so let's summarize now this" | |
}, | |
{ | |
"start": 2876.44, | |
"text": "algorithm this algorithm is called" | |
}, | |
{ | |
"start": 2878.2, | |
"text": "gradient descent let's summarize it" | |
}, | |
{ | |
"start": 2879.8, | |
"text": "first in pseudo code and then we'll look" | |
}, | |
{ | |
"start": 2881.8, | |
"text": "at it in actual code in a second so" | |
}, | |
{ | |
"start": 2884.599, | |
"text": "there's a few steps first step is we" | |
}, | |
{ | |
"start": 2886.64, | |
"text": "initialize our location somewhere" | |
}, | |
{ | |
"start": 2889.2, | |
"text": "randomly in this weight space right we" | |
}, | |
{ | |
"start": 2892.4, | |
"text": "compute the gradient of of our loss at" | |
}, | |
{ | |
"start": 2897.04, | |
"text": "with respect to our weights okay and" | |
}, | |
{ | |
"start": 2900.24, | |
"text": "then we take a small step in the" | |
}, | |
{ | |
"start": 2901.76, | |
"text": "opposite direction and we keep repeating" | |
}, | |
{ | |
"start": 2903.76, | |
"text": "this in a loop over and over and over" | |
}, | |
{ | |
"start": 2905.48, | |
"text": "again and we say we keep we keep doing" | |
}, | |
{ | |
"start": 2907.2, | |
"text": "this until convergence right until we" | |
}, | |
{ | |
"start": 2909.359, | |
"text": "stop moving basically and our Network" | |
}, | |
{ | |
"start": 2911.72, | |
"text": "basically finds where it's supposed to" | |
}, | |
{ | |
"start": 2913.359, | |
"text": "end up we'll talk about this this uh" | |
}, | |
{ | |
"start": 2917.0, | |
"text": "this small step right so we're" | |
}, | |
{ | |
"start": 2918.599, | |
"text": "multiplying our gradient by what I keep" | |
}, | |
{ | |
"start": 2920.92, | |
"text": "calling is a small step we'll talk about" | |
}, | |
{ | |
"start": 2923.0, | |
"text": "that a bit more about a bit more in" | |
}, | |
{ | |
"start": 2925.72, | |
"text": "later part of this this lecture but for" | |
}, | |
{ | |
"start": 2928.079, | |
"text": "now let's also very quickly show the" | |
}, | |
{ | |
"start": 2930.079, | |
"text": "analogous part in in code as well and it" | |
}, | |
{ | |
"start": 2933.28, | |
"text": "mirrors very nicely right so we'll" | |
}, | |
{ | |
"start": 2935.2, | |
"text": "randomly initialize our weight" | |
}, | |
{ | |
"start": 2937.599, | |
"text": "this happens every time you train a" | |
}, | |
{ | |
"start": 2938.92, | |
"text": "neural network you have to randomly" | |
}, | |
{ | |
"start": 2940.28, | |
"text": "initialize the weights and then you have" | |
}, | |
{ | |
"start": 2941.92, | |
"text": "a loop right here showing it without" | |
}, | |
{ | |
"start": 2944.799, | |
"text": "even convergence right we're just going" | |
}, | |
{ | |
"start": 2946.359, | |
"text": "to keep looping forever where we say" | |
}, | |
{ | |
"start": 2949.119, | |
"text": "okay we're going to compute the loss at" | |
}, | |
{ | |
"start": 2950.76, | |
"text": "that location compute the gradient so" | |
}, | |
{ | |
"start": 2953.28, | |
"text": "which way is up and then we just negate" | |
}, | |
{ | |
"start": 2956.359, | |
"text": "that gradient multiply it by some what's" | |
}, | |
{ | |
"start": 2958.48, | |
"text": "called learning rate LR denoted here" | |
}, | |
{ | |
"start": 2960.839, | |
"text": "it's a small step and then we take a" | |
}, | |
{ | |
"start": 2963.119, | |
"text": "direction in that small" | |
}, | |
{ | |
"start": 2965.319, | |
"text": "step so let's take a deeper look at this" | |
}, | |
{ | |
"start": 2968.119, | |
"text": "term here this is called the gradient" | |
}, | |
{ | |
"start": 2969.92, | |
"text": "right this tells us which way is up in" | |
}, | |
{ | |
"start": 2971.92, | |
"text": "that landscape and this again it tells" | |
}, | |
{ | |
"start": 2974.839, | |
"text": "us even more than that it tells us how" | |
}, | |
{ | |
"start": 2976.64, | |
"text": "is our landscape how is our loss" | |
}, | |
{ | |
"start": 2979.319, | |
"text": "changing as a function of all of our" | |
}, | |
{ | |
"start": 2981.799, | |
"text": "weights but I actually have not told you" | |
}, | |
{ | |
"start": 2984.44, | |
"text": "how to compute this so let's talk about" | |
}, | |
{ | |
"start": 2986.559, | |
"text": "that process that process is called back" | |
}, | |
{ | |
"start": 2988.68, | |
"text": "propagation we'll go through this very" | |
}, | |
{ | |
"start": 2990.72, | |
"text": "very briefly and we'll start with the" | |
}, | |
{ | |
"start": 2993.24, | |
"text": "simplest neural network uh that's" | |
}, | |
{ | |
"start": 2995.68, | |
"text": "possible right so we already saw the" | |
}, | |
{ | |
"start": 2997.68, | |
"text": "simplest building block which is a" | |
}, | |
{ | |
"start": 2999.24, | |
"text": "single neuron now let's build the" | |
}, | |
{ | |
"start": 3000.599, | |
"text": "simplest neural network which is just a" | |
}, | |
{ | |
"start": 3002.88, | |
"text": "one neuron neural network right so it" | |
}, | |
{ | |
"start": 3005.24, | |
"text": "has one hidden neuron it goes from input" | |
}, | |
{ | |
"start": 3007.2, | |
"text": "to Hidden neuron to output and we want" | |
}, | |
{ | |
"start": 3009.839, | |
"text": "to compute the gradient of our loss with" | |
}, | |
{ | |
"start": 3012.24, | |
"text": "respect to this weight W2 okay so I'm" | |
}, | |
{ | |
"start": 3015.92, | |
"text": "highlighting it here so we have two" | |
}, | |
{ | |
"start": 3017.68, | |
"text": "weights let's compute the gradient first" | |
}, | |
{ | |
"start": 3020.48, | |
"text": "with respect to W2 and that tells us how" | |
}, | |
{ | |
"start": 3023.72, | |
"text": "much does a small change in w 2 affect" | |
}, | |
{ | |
"start": 3027.68, | |
"text": "our loss does our loss go up or down if" | |
}, | |
{ | |
"start": 3029.88, | |
"text": "we move our W2 a little bit in One" | |
}, | |
{ | |
"start": 3032.2, | |
"text": "Direction or another so let's write out" | |
}, | |
{ | |
"start": 3035.0, | |
"text": "this derivative we can start by applying" | |
}, | |
{ | |
"start": 3037.0, | |
"text": "the chain rule backwards from the loss" | |
}, | |
{ | |
"start": 3039.68, | |
"text": "through the" | |
}, | |
{ | |
"start": 3040.559, | |
"text": "output and specifically we can actually" | |
}, | |
{ | |
"start": 3043.64, | |
"text": "decompose this law this uh derivative" | |
}, | |
{ | |
"start": 3047.0, | |
"text": "this gradient into two parts right so" | |
}, | |
{ | |
"start": 3049.16, | |
"text": "the first part we're decomposing it from" | |
}, | |
{ | |
"start": 3051.52, | |
"text": "DJ" | |
}, | |
{ | |
"start": 3052.68, | |
"text": "dw2 into DJ Dy right which is our output" | |
}, | |
{ | |
"start": 3058.839, | |
"text": "multiplied by Dy dw2 right this is all" | |
}, | |
{ | |
"start": 3062.319, | |
"text": "possible right it's a chain rule it's a" | |
}, | |
{ | |
"start": 3064.839, | |
"text": "I'm just reciting a chain rule here from" | |
}, | |
{ | |
"start": 3067.92, | |
"text": "calculus this is possible because Y is" | |
}, | |
{ | |
"start": 3070.359, | |
"text": "only dependent on the previous layer and" | |
}, | |
{ | |
"start": 3073.24, | |
"text": "now let's suppose we don't want to do" | |
}, | |
{ | |
"start": 3074.48, | |
"text": "this for W2 but we want to do it for W1" | |
}, | |
{ | |
"start": 3076.96, | |
"text": "we can use the exact same process right" | |
}, | |
{ | |
"start": 3078.64, | |
"text": "but now it's one step further right" | |
}, | |
{ | |
"start": 3080.76, | |
"text": "we'll now replace W2 with W1 we need to" | |
}, | |
{ | |
"start": 3083.4, | |
"text": "apply the chain rule yet again once" | |
}, | |
{ | |
"start": 3085.52, | |
"text": "again to decompose the problem further" | |
}, | |
{ | |
"start": 3087.2, | |
"text": "and now we propagate our old gradient" | |
}, | |
{ | |
"start": 3089.0, | |
"text": "that we computed for W2 all the way back" | |
}, | |
{ | |
"start": 3092.28, | |
"text": "one more step uh to the weight that" | |
}, | |
{ | |
"start": 3094.48, | |
"text": "we're interested in which in this case" | |
}, | |
{ | |
"start": 3095.92, | |
"text": "is" | |
}, | |
{ | |
"start": 3097.0, | |
"text": "W1 and we keep repeating this process" | |
}, | |
{ | |
"start": 3099.68, | |
"text": "over and over again propagating these" | |
}, | |
{ | |
"start": 3101.4, | |
"text": "gradients backwards from output to input" | |
}, | |
{ | |
"start": 3104.4, | |
"text": "to compute ultimately what we want in" | |
}, | |
{ | |
"start": 3106.799, | |
"text": "the end is this derivative of every" | |
}, | |
{ | |
"start": 3109.64, | |
"text": "weight so the the derivative of our loss" | |
}, | |
{ | |
"start": 3112.48, | |
"text": "with respect to every weight in our" | |
}, | |
{ | |
"start": 3114.04, | |
"text": "neural network this tells us how much" | |
}, | |
{ | |
"start": 3115.799, | |
"text": "does a small change in every single" | |
}, | |
{ | |
"start": 3117.559, | |
"text": "weight in our Network affect the loss" | |
}, | |
{ | |
"start": 3119.44, | |
"text": "does our loss go up or down if we change" | |
}, | |
{ | |
"start": 3121.24, | |
"text": "this weight a little bit in this" | |
}, | |
{ | |
"start": 3122.799, | |
"text": "direction or a little bit in that" | |
}, | |
{ | |
"start": 3124.079, | |
"text": "direction yes I think you use the term" | |
}, | |
{ | |
"start": 3127.16, | |
"text": "neuron is perceptron is there a" | |
}, | |
{ | |
"start": 3129.2, | |
"text": "functional difference neuron and" | |
}, | |
{ | |
"start": 3130.76, | |
"text": "perceptron are the same so typically" | |
}, | |
{ | |
"start": 3132.64, | |
"text": "people say neural network which is why" | |
}, | |
{ | |
"start": 3134.52, | |
"text": "like a single neuron it's also gotten" | |
}, | |
{ | |
"start": 3136.559, | |
"text": "popularity but originally a perceptron" | |
}, | |
{ | |
"start": 3139.2, | |
"text": "is is the the formal term the two terms" | |
}, | |
{ | |
"start": 3141.88, | |
"text": "are" | |
}, | |
{ | |
"start": 3144.48, | |
"text": "identical Okay so now we've covered a" | |
}, | |
{ | |
"start": 3148.0, | |
"text": "lot so we've covered the forward" | |
}, | |
{ | |
"start": 3149.28, | |
"text": "propagation of information through a" | |
}, | |
{ | |
"start": 3150.839, | |
"text": "neuron and through a neural network all" | |
}, | |
{ | |
"start": 3153.2, | |
"text": "the way through and we've covered now" | |
}, | |
{ | |
"start": 3155.04, | |
"text": "the back propagation of information to" | |
}, | |
{ | |
"start": 3157.839, | |
"text": "understand how we should uh change every" | |
}, | |
{ | |
"start": 3160.16, | |
"text": "single one of those weights in our" | |
}, | |
{ | |
"start": 3161.44, | |
"text": "neural network to improve our" | |
}, | |
{ | |
"start": 3164.319, | |
"text": "loss so that was the back propop" | |
}, | |
{ | |
"start": 3166.839, | |
"text": "algorithm in theory it's actually pretty" | |
}, | |
{ | |
"start": 3169.559, | |
"text": "simple it's just a chain rule right" | |
}, | |
{ | |
"start": 3171.64, | |
"text": "there's nothing there's actually nothing" | |
}, | |
{ | |
"start": 3172.92, | |
"text": "more than than just the chain Rule and" | |
}, | |
{ | |
"start": 3175.799, | |
"text": "the nice part that deep learning" | |
}, | |
{ | |
"start": 3177.2, | |
"text": "libraries actually do this for you so" | |
}, | |
{ | |
"start": 3178.92, | |
"text": "they compute back prop for you you don't" | |
}, | |
{ | |
"start": 3180.599, | |
"text": "actually have to implement it yourself" | |
}, | |
{ | |
"start": 3181.96, | |
"text": "which is very convenient but now it's" | |
}, | |
{ | |
"start": 3184.04, | |
"text": "important to touch on even though the" | |
}, | |
{ | |
"start": 3186.24, | |
"text": "theory is actually not that complicated" | |
}, | |
{ | |
"start": 3188.119, | |
"text": "for back propagation let's touch on it" | |
}, | |
{ | |
"start": 3190.28, | |
"text": "now from practice now thinking a little" | |
}, | |
{ | |
"start": 3192.559, | |
"text": "bit towards your own implementations" | |
}, | |
{ | |
"start": 3194.2, | |
"text": "when you want to implement these neural" | |
}, | |
{ | |
"start": 3196.079, | |
"text": "networks what are some insights so" | |
}, | |
{ | |
"start": 3198.92, | |
"text": "optimization of neural networks in" | |
}, | |
{ | |
"start": 3200.76, | |
"text": "practice is a completely different story" | |
}, | |
{ | |
"start": 3202.839, | |
"text": "it's not straightforward at all and in" | |
}, | |
{ | |
"start": 3205.64, | |
"text": "practice it's very difficult and usually" | |
}, | |
{ | |
"start": 3207.799, | |
"text": "very computationally intensive to do" | |
}, | |
{ | |
"start": 3209.799, | |
"text": "this backrop algorithm so here's an" | |
}, | |
{ | |
"start": 3212.079, | |
"text": "illustration from a paper that came out" | |
}, | |
{ | |
"start": 3214.079, | |
"text": "a few years ago that actually attempted" | |
}, | |
{ | |
"start": 3216.52, | |
"text": "to visualize a very deep neural" | |
}, | |
{ | |
"start": 3218.599, | |
"text": "Network's lost landscape so previously" | |
}, | |
{ | |
"start": 3220.599, | |
"text": "we had that other uh depiction" | |
}, | |
{ | |
"start": 3222.96, | |
"text": "visualization of how a neural network" | |
}, | |
{ | |
"start": 3225.0, | |
"text": "would look in a two-dimensional" | |
}, | |
{ | |
"start": 3226.0, | |
"text": "landscape real neural networks are not" | |
}, | |
{ | |
"start": 3228.04, | |
"text": "two-dimensional" | |
}, | |
{ | |
"start": 3229.68, | |
"text": "they're hundreds or millions or billions" | |
}, | |
{ | |
"start": 3232.2, | |
"text": "of dimensions and now what would those" | |
}, | |
{ | |
"start": 3235.799, | |
"text": "lost landscap apes look like you can" | |
}, | |
{ | |
"start": 3237.599, | |
"text": "actually try some clever techniques to" | |
}, | |
{ | |
"start": 3239.64, | |
"text": "actually visualize them this is one" | |
}, | |
{ | |
"start": 3240.88, | |
"text": "paper that attempted to do that and it" | |
}, | |
{ | |
"start": 3243.28, | |
"text": "turns out that they look extremely messy" | |
}, | |
{ | |
"start": 3246.68, | |
"text": "right um the important thing is that if" | |
}, | |
{ | |
"start": 3249.799, | |
"text": "you do this algorithm and you start in a" | |
}, | |
{ | |
"start": 3251.88, | |
"text": "bad place depending on your neural" | |
}, | |
{ | |
"start": 3253.64, | |
"text": "network you may not actually end up in" | |
}, | |
{ | |
"start": 3255.92, | |
"text": "the the global solution right so your" | |
}, | |
{ | |
"start": 3258.0, | |
"text": "initialization matters a lot and you" | |
}, | |
{ | |
"start": 3260.04, | |
"text": "need to kind of Traverse these local" | |
}, | |
{ | |
"start": 3261.839, | |
"text": "Minima and try to try and help you find" | |
}, | |
{ | |
"start": 3264.24, | |
"text": "the global Minima or even more than that" | |
}, | |
{ | |
"start": 3266.799, | |
"text": "you need to construct neural networks" | |
}, | |
{ | |
"start": 3269.48, | |
"text": "that have lost Landscapes that are much" | |
}, | |
{ | |
"start": 3271.88, | |
"text": "more amenable to optimization than this" | |
}, | |
{ | |
"start": 3274.04, | |
"text": "one right so this is a very bad lost" | |
}, | |
{ | |
"start": 3275.599, | |
"text": "landscape there are some techniques that" | |
}, | |
{ | |
"start": 3277.64, | |
"text": "we can apply to our neural networks that" | |
}, | |
{ | |
"start": 3279.92, | |
"text": "smooth out their lost landscape and make" | |
}, | |
{ | |
"start": 3281.68, | |
"text": "them easier to" | |
}, | |
{ | |
"start": 3283.04, | |
"text": "optimize so recall that update equation" | |
}, | |
{ | |
"start": 3286.04, | |
"text": "that we talked about earlier with" | |
}, | |
{ | |
"start": 3287.92, | |
"text": "gradient descent right so there is this" | |
}, | |
{ | |
"start": 3289.76, | |
"text": "parameter here that we didn't talk about" | |
}, | |
{ | |
"start": 3292.24, | |
"text": "we we described this as the little step" | |
}, | |
{ | |
"start": 3294.2, | |
"text": "that you could take right so it's a" | |
}, | |
{ | |
"start": 3295.359, | |
"text": "small number that multiply with the" | |
}, | |
{ | |
"start": 3297.76, | |
"text": "direction which is your gradient it just" | |
}, | |
{ | |
"start": 3299.72, | |
"text": "tells you okay I'm not going to just go" | |
}, | |
{ | |
"start": 3301.44, | |
"text": "all the way in this direction I'll just" | |
}, | |
{ | |
"start": 3302.839, | |
"text": "take a small step in this direction so" | |
}, | |
{ | |
"start": 3305.359, | |
"text": "in practice even setting this value" | |
}, | |
{ | |
"start": 3307.88, | |
"text": "right it's just one number setting this" | |
}, | |
{ | |
"start": 3309.68, | |
"text": "one number can be rather difficult right" | |
}, | |
{ | |
"start": 3312.839, | |
"text": "if we set the learning rate too um small" | |
}, | |
{ | |
"start": 3316.68, | |
"text": "then the model can get stuck in these" | |
}, | |
{ | |
"start": 3319.04, | |
"text": "local Minima right so here it starts and" | |
}, | |
{ | |
"start": 3321.359, | |
"text": "it kind of gets stuck in this local" | |
}, | |
{ | |
"start": 3322.839, | |
"text": "Minima it converges very slowly even if" | |
}, | |
{ | |
"start": 3325.2, | |
"text": "it doesn't get stuck if the learning" | |
}, | |
{ | |
"start": 3327.24, | |
"text": "rate is too large it can kind of" | |
}, | |
{ | |
"start": 3328.96, | |
"text": "overshoot and in practice it even" | |
}, | |
{ | |
"start": 3331.079, | |
"text": "diverges and explodes and you don't" | |
}, | |
{ | |
"start": 3333.839, | |
"text": "actually ever find any" | |
}, | |
{ | |
"start": 3335.839, | |
"text": "Minima now ideally what we want is to" | |
}, | |
{ | |
"start": 3338.599, | |
"text": "use learning rates that are not too" | |
}, | |
{ | |
"start": 3340.4, | |
"text": "small and not too large to so they're" | |
}, | |
{ | |
"start": 3343.4, | |
"text": "large enough to basically avoid those" | |
}, | |
{ | |
"start": 3345.039, | |
"text": "local Minima but small enough such that" | |
}, | |
{ | |
"start": 3347.88, | |
"text": "they won't diverge and they will" | |
}, | |
{ | |
"start": 3349.28, | |
"text": "actually still find their way into the" | |
}, | |
{ | |
"start": 3352.039, | |
"text": "global Minima so something like this is" | |
}, | |
{ | |
"start": 3354.24, | |
"text": "what you should intuitively have in mind" | |
}, | |
{ | |
"start": 3356.079, | |
"text": "right so something that can overshoot" | |
}, | |
{ | |
"start": 3357.44, | |
"text": "the local minimas but find itself into a" | |
}, | |
{ | |
"start": 3359.96, | |
"text": "a better Minima and then finally" | |
}, | |
{ | |
"start": 3362.119, | |
"text": "stabilize itself there so how do we" | |
}, | |
{ | |
"start": 3364.44, | |
"text": "actually set these learning rates right" | |
}, | |
{ | |
"start": 3366.44, | |
"text": "in practice what does that process look" | |
}, | |
{ | |
"start": 3368.16, | |
"text": "like now idea number one is is very" | |
}, | |
{ | |
"start": 3371.44, | |
"text": "basic right it's try a bunch of" | |
}, | |
{ | |
"start": 3372.839, | |
"text": "different learning rates and see what" | |
}, | |
{ | |
"start": 3374.16, | |
"text": "works and that's actually a not a bad" | |
}, | |
{ | |
"start": 3377.28, | |
"text": "process in practice it's one of the" | |
}, | |
{ | |
"start": 3378.799, | |
"text": "processes that people use um so that" | |
}, | |
{ | |
"start": 3382.28, | |
"text": "that's uh that's interesting but let's" | |
}, | |
{ | |
"start": 3383.96, | |
"text": "see if we can do something smarter than" | |
}, | |
{ | |
"start": 3385.48, | |
"text": "this and let's see how can design" | |
}, | |
{ | |
"start": 3387.64, | |
"text": "algorithms that uh can adapt to the" | |
}, | |
{ | |
"start": 3390.52, | |
"text": "Landscapes right so in practice there's" | |
}, | |
{ | |
"start": 3392.64, | |
"text": "no reason why this should be a single" | |
}, | |
{ | |
"start": 3394.119, | |
"text": "number right can we have learning rates" | |
}, | |
{ | |
"start": 3397.119, | |
"text": "that adapt to the model to the data to" | |
}, | |
{ | |
"start": 3400.2, | |
"text": "the Landscapes to the gradients that" | |
}, | |
{ | |
"start": 3401.799, | |
"text": "it's seeing around so this means that" | |
}, | |
{ | |
"start": 3404.039, | |
"text": "the learning rate may actually increase" | |
}, | |
{ | |
"start": 3406.2, | |
"text": "or decrease as a function of the" | |
}, | |
{ | |
"start": 3409.0, | |
"text": "gradients in the loss function right how" | |
}, | |
{ | |
"start": 3411.72, | |
"text": "fast we're learning or many other" | |
}, | |
{ | |
"start": 3413.799, | |
"text": "options right there are many different" | |
}, | |
{ | |
"start": 3415.76, | |
"text": "ideas that could be done here and in" | |
}, | |
{ | |
"start": 3417.359, | |
"text": "fact there are many widely used" | |
}, | |
{ | |
"start": 3420.44, | |
"text": "different procedures or methodologies" | |
}, | |
{ | |
"start": 3423.28, | |
"text": "for setting the learning rate and during" | |
}, | |
{ | |
"start": 3425.88, | |
"text": "your Labs we actually encourage you to" | |
}, | |
{ | |
"start": 3427.799, | |
"text": "try out some of these different ideas" | |
}, | |
{ | |
"start": 3429.96, | |
"text": "for different types of learning rates" | |
}, | |
{ | |
"start": 3431.44, | |
"text": "and and even play around with you know" | |
}, | |
{ | |
"start": 3433.48, | |
"text": "what what's the effect of increasing or" | |
}, | |
{ | |
"start": 3435.119, | |
"text": "decreasing your learning rate you'll see" | |
}, | |
{ | |
"start": 3436.599, | |
"text": "very striking" | |
}, | |
{ | |
"start": 3439.559, | |
"text": "differences do it because it's on a" | |
}, | |
{ | |
"start": 3441.44, | |
"text": "close interval why not just find the" | |
}, | |
{ | |
"start": 3443.799, | |
"text": "absolute minimum you know test" | |
}, | |
{ | |
"start": 3447.96, | |
"text": "right so so a few things what number one" | |
}, | |
{ | |
"start": 3450.559, | |
"text": "is that it's not a closed space right so" | |
}, | |
{ | |
"start": 3452.76, | |
"text": "there's an infinite every every weight" | |
}, | |
{ | |
"start": 3454.68, | |
"text": "can be plus or minus up to Infinity" | |
}, | |
{ | |
"start": 3457.28, | |
"text": "right so even if it was a" | |
}, | |
{ | |
"start": 3459.319, | |
"text": "one-dimensional neural network with just" | |
}, | |
{ | |
"start": 3461.24, | |
"text": "one weight it's not a closed" | |
}, | |
{ | |
"start": 3463.559, | |
"text": "space in practice it's even worse than" | |
}, | |
{ | |
"start": 3466.079, | |
"text": "that because you have billions of" | |
}, | |
{ | |
"start": 3468.839, | |
"text": "Dimensions right so not only is your uh" | |
}, | |
{ | |
"start": 3472.119, | |
"text": "space your support system in one" | |
}, | |
{ | |
"start": 3474.4, | |
"text": "dimension is it infinite but you now" | |
}, | |
{ | |
"start": 3476.92, | |
"text": "have billions of infinite Dimensions" | |
}, | |
{ | |
"start": 3478.76, | |
"text": "right or billions of uh infinite support" | |
}, | |
{ | |
"start": 3480.88, | |
"text": "spaces so it's not something that you" | |
}, | |
{ | |
"start": 3482.799, | |
"text": "can just like search every weight every" | |
}, | |
{ | |
"start": 3484.92, | |
"text": "possible weight in your neural in your" | |
}, | |
{ | |
"start": 3487.68, | |
"text": "configuration or what is every possible" | |
}, | |
{ | |
"start": 3489.4, | |
"text": "weight that this neural network could" | |
}, | |
{ | |
"start": 3490.64, | |
"text": "take and let me test them out because it" | |
}, | |
{ | |
"start": 3493.799, | |
"text": "it's not practical to do even for a very" | |
}, | |
{ | |
"start": 3495.52, | |
"text": "small neural network in" | |
}, | |
{ | |
"start": 3498.96, | |
"text": "practice so in your Labs you can really" | |
}, | |
{ | |
"start": 3501.64, | |
"text": "try to put all of this information uh in" | |
}, | |
{ | |
"start": 3504.16, | |
"text": "this picture into practice which defines" | |
}, | |
{ | |
"start": 3506.96, | |
"text": "your model number one right here defines" | |
}, | |
{ | |
"start": 3510.599, | |
"text": "your Optimizer which previously we" | |
}, | |
{ | |
"start": 3513.48, | |
"text": "denoted as this gradient descent" | |
}, | |
{ | |
"start": 3515.16, | |
"text": "Optimizer here we're calling it uh" | |
}, | |
{ | |
"start": 3517.24, | |
"text": "stochastic gradient descent or SGD we'll" | |
}, | |
{ | |
"start": 3519.64, | |
"text": "talk about that more in a second and" | |
}, | |
{ | |
"start": 3521.799, | |
"text": "then also note that your Optimizer which" | |
}, | |
{ | |
"start": 3524.839, | |
"text": "here we're calling SGD could be any of" | |
}, | |
{ | |
"start": 3527.52, | |
"text": "these adaptive optimizers you can swap" | |
}, | |
{ | |
"start": 3529.28, | |
"text": "them out and you should swap them out" | |
}, | |
{ | |
"start": 3530.64, | |
"text": "you should test different things here to" | |
}, | |
{ | |
"start": 3532.119, | |
"text": "see the impact of these different" | |
}, | |
{ | |
"start": 3534.44, | |
"text": "methods on your training procedure and" | |
}, | |
{ | |
"start": 3536.96, | |
"text": "you'll gain very valuable intuition for" | |
}, | |
{ | |
"start": 3539.96, | |
"text": "the different insights that will come" | |
}, | |
{ | |
"start": 3541.319, | |
"text": "with that as well so I want to continue" | |
}, | |
{ | |
"start": 3543.64, | |
"text": "very briefly just for the end of this" | |
}, | |
{ | |
"start": 3545.16, | |
"text": "lecture to talk about tips for training" | |
}, | |
{ | |
"start": 3547.88, | |
"text": "neural networks in practice and how we" | |
}, | |
{ | |
"start": 3549.92, | |
"text": "can focus on this powerful idea of" | |
}, | |
{ | |
"start": 3553.359, | |
"text": "really what's called batching data right" | |
}, | |
{ | |
"start": 3555.96, | |
"text": "not seeing all of your data but now" | |
}, | |
{ | |
"start": 3558.44, | |
"text": "talking about a topic called" | |
}, | |
{ | |
"start": 3560.359, | |
"text": "batching so to do this let's very" | |
}, | |
{ | |
"start": 3562.599, | |
"text": "briefly revisit this gradient descent" | |
}, | |
{ | |
"start": 3564.319, | |
"text": "algorithm the gradient is compute this" | |
}, | |
{ | |
"start": 3567.16, | |
"text": "gradient computation the backrop" | |
}, | |
{ | |
"start": 3569.039, | |
"text": "algorithm I mentioned this earlier it's" | |
}, | |
{ | |
"start": 3570.839, | |
"text": "a very computationally expensive uh" | |
}, | |
{ | |
"start": 3573.72, | |
"text": "operation and it's even worse because we" | |
}, | |
{ | |
"start": 3576.24, | |
"text": "now are we previously described it in a" | |
}, | |
{ | |
"start": 3578.44, | |
"text": "way where we would have to compute it" | |
}, | |
{ | |
"start": 3580.0, | |
"text": "over a summation over every single data" | |
}, | |
{ | |
"start": 3582.64, | |
"text": "point in our entire data set right" | |
}, | |
{ | |
"start": 3584.92, | |
"text": "that's how we defined it with the loss" | |
}, | |
{ | |
"start": 3586.24, | |
"text": "function it's an average over all of our" | |
}, | |
{ | |
"start": 3588.079, | |
"text": "data points which means that we're" | |
}, | |
{ | |
"start": 3589.48, | |
"text": "summing over all of our data points the" | |
}, | |
{ | |
"start": 3591.44, | |
"text": "gradients so in most real life problems" | |
}, | |
{ | |
"start": 3594.359, | |
"text": "this would be completely infeasible to" | |
}, | |
{ | |
"start": 3596.119, | |
"text": "do because our data sets are simply too" | |
}, | |
{ | |
"start": 3597.72, | |
"text": "big and the models are too big to to" | |
}, | |
{ | |
"start": 3600.079, | |
"text": "compute those gradients on every single" | |
}, | |
{ | |
"start": 3601.72, | |
"text": "iteration remember this isn't just a" | |
}, | |
{ | |
"start": 3603.2, | |
"text": "onetime thing right it's every single" | |
}, | |
{ | |
"start": 3605.319, | |
"text": "step that you do you keep taking small" | |
}, | |
{ | |
"start": 3607.079, | |
"text": "steps so you keep need you keep needing" | |
}, | |
{ | |
"start": 3609.16, | |
"text": "to repeat this process so instead let's" | |
}, | |
{ | |
"start": 3611.68, | |
"text": "define a new gradient descent algorithm" | |
}, | |
{ | |
"start": 3613.68, | |
"text": "called SGD stochastic gradient descent" | |
}, | |
{ | |
"start": 3616.76, | |
"text": "instead of computing the gradient over" | |
}, | |
{ | |
"start": 3618.48, | |
"text": "the entire data set now let's just pick" | |
}, | |
{ | |
"start": 3621.68, | |
"text": "a single training point and compute that" | |
}, | |
{ | |
"start": 3624.4, | |
"text": "one training Point gradient" | |
}, | |
{ | |
"start": 3626.48, | |
"text": "right the nice thing about that is that" | |
}, | |
{ | |
"start": 3628.839, | |
"text": "it's much easier to compute that" | |
}, | |
{ | |
"start": 3630.72, | |
"text": "gradient right it only needs one point" | |
}, | |
{ | |
"start": 3633.16, | |
"text": "and the downside is that it's very noisy" | |
}, | |
{ | |
"start": 3636.28, | |
"text": "it's very stochastic since it was" | |
}, | |
{ | |
"start": 3638.359, | |
"text": "computed using just that one examples" | |
}, | |
{ | |
"start": 3640.2, | |
"text": "right so you have that that tradeoff" | |
}, | |
{ | |
"start": 3641.96, | |
"text": "that" | |
}, | |
{ | |
"start": 3642.72, | |
"text": "exists so what's the middle ground right" | |
}, | |
{ | |
"start": 3645.24, | |
"text": "the middle ground is to take not one" | |
}, | |
{ | |
"start": 3647.079, | |
"text": "data point and not the full data set but" | |
}, | |
{ | |
"start": 3650.359, | |
"text": "a batch of data right so take a what's" | |
}, | |
{ | |
"start": 3652.079, | |
"text": "called a mini batch right this could be" | |
}, | |
{ | |
"start": 3653.799, | |
"text": "something in practice like 32 pieces of" | |
}, | |
{ | |
"start": 3656.24, | |
"text": "data is a common batch size and this" | |
}, | |
{ | |
"start": 3658.92, | |
"text": "gives us an estimate of the true" | |
}, | |
{ | |
"start": 3660.839, | |
"text": "gradient right so you approximate the" | |
}, | |
{ | |
"start": 3662.52, | |
"text": "gradient by averaging the gradient of" | |
}, | |
{ | |
"start": 3664.599, | |
"text": "these 32 samples it's still fast because" | |
}, | |
{ | |
"start": 3668.0, | |
"text": "32 is much smaller than the size of your" | |
}, | |
{ | |
"start": 3670.24, | |
"text": "entire data set but it's pretty quick" | |
}, | |
{ | |
"start": 3672.96, | |
"text": "now right it's still noisy but it's okay" | |
}, | |
{ | |
"start": 3675.039, | |
"text": "usually in practice because you can" | |
}, | |
{ | |
"start": 3676.359, | |
"text": "still iterate much" | |
}, | |
{ | |
"start": 3678.4, | |
"text": "faster and since B is normally not that" | |
}, | |
{ | |
"start": 3681.0, | |
"text": "large again think of something like in" | |
}, | |
{ | |
"start": 3682.96, | |
"text": "the tens or the hundreds of samples it's" | |
}, | |
{ | |
"start": 3686.0, | |
"text": "very fast to compute this in practice" | |
}, | |
{ | |
"start": 3688.039, | |
"text": "compared to regular gradient descent and" | |
}, | |
{ | |
"start": 3690.319, | |
"text": "it's also much more accurate compared to" | |
}, | |
{ | |
"start": 3692.4, | |
"text": "stochastic gradient descent and the" | |
}, | |
{ | |
"start": 3694.559, | |
"text": "increase in accuracy of this gradient" | |
}, | |
{ | |
"start": 3697.0, | |
"text": "estimation allows us to converge to our" | |
}, | |
{ | |
"start": 3699.52, | |
"text": "solution significantly faster as well" | |
}, | |
{ | |
"start": 3702.44, | |
"text": "right it's not only about the speed it's" | |
}, | |
{ | |
"start": 3704.359, | |
"text": "just about the increase in accuracy of" | |
}, | |
{ | |
"start": 3706.2, | |
"text": "those gradients allows us to get to our" | |
}, | |
{ | |
"start": 3708.4, | |
"text": "solution much" | |
}, | |
{ | |
"start": 3709.92, | |
"text": "faster which ultimately means that we" | |
}, | |
{ | |
"start": 3712.0, | |
"text": "can train much faster as well and we can" | |
}, | |
{ | |
"start": 3714.039, | |
"text": "save compute and the other really nice" | |
}, | |
{ | |
"start": 3716.88, | |
"text": "thing about mini batches is that they" | |
}, | |
{ | |
"start": 3719.559, | |
"text": "allow for parallelizing our computation" | |
}, | |
{ | |
"start": 3723.24, | |
"text": "right and that was a concept that we had" | |
}, | |
{ | |
"start": 3724.64, | |
"text": "talked about earlier in the class as" | |
}, | |
{ | |
"start": 3726.0, | |
"text": "well and here's where it's coming in we" | |
}, | |
{ | |
"start": 3727.92, | |
"text": "can split up those batches right so" | |
}, | |
{ | |
"start": 3730.079, | |
"text": "those 32 pieces of data let's say if our" | |
}, | |
{ | |
"start": 3732.2, | |
"text": "batch size is 32 we can split them up" | |
}, | |
{ | |
"start": 3734.68, | |
"text": "onto different workers right different" | |
}, | |
{ | |
"start": 3737.079, | |
"text": "parts of the GPU can tackle those" | |
}, | |
{ | |
"start": 3739.359, | |
"text": "different parts of our data points this" | |
}, | |
{ | |
"start": 3742.839, | |
"text": "can allow us to basically achieve even" | |
}, | |
{ | |
"start": 3744.599, | |
"text": "more significant speed up using GPU" | |
}, | |
{ | |
"start": 3747.279, | |
"text": "architectures and GPU Hardware okay" | |
}, | |
{ | |
"start": 3750.16, | |
"text": "finally last topic I want to talk about" | |
}, | |
{ | |
"start": 3752.319, | |
"text": "before we end this lecture and move on" | |
}, | |
{ | |
"start": 3754.16, | |
"text": "to lecture number two is overfitting" | |
}, | |
{ | |
"start": 3757.079, | |
"text": "right so overfitting is this idea that" | |
}, | |
{ | |
"start": 3759.559, | |
"text": "is actually not a deep learning Centric" | |
}, | |
{ | |
"start": 3761.559, | |
"text": "problem at all it's it's a problem that" | |
}, | |
{ | |
"start": 3763.0, | |
"text": "exists in all of machine learning right" | |
}, | |
{ | |
"start": 3765.52, | |
"text": "the key problem is that and the key" | |
}, | |
{ | |
"start": 3769.0, | |
"text": "problem is actually one" | |
}, | |
{ | |
"start": 3771.44, | |
"text": "that addresses how you can accurately" | |
}, | |
{ | |
"start": 3774.64, | |
"text": "Define if if your model is is actually" | |
}, | |
{ | |
"start": 3778.319, | |
"text": "capturing your true data set right or if" | |
}, | |
{ | |
"start": 3781.52, | |
"text": "it's just learning kind of the subtle" | |
}, | |
{ | |
"start": 3783.44, | |
"text": "details that are kind of sply" | |
}, | |
{ | |
"start": 3786.279, | |
"text": "correlating to your data set so said" | |
}, | |
{ | |
"start": 3789.119, | |
"text": "differently let me say it a bit" | |
}, | |
{ | |
"start": 3790.52, | |
"text": "differently now so let's say we want to" | |
}, | |
{ | |
"start": 3793.4, | |
"text": "build models that can learn" | |
}, | |
{ | |
"start": 3796.4, | |
"text": "representations okay from our training" | |
}, | |
{ | |
"start": 3798.48, | |
"text": "data that still generalize to brand new" | |
}, | |
{ | |
"start": 3801.72, | |
"text": "unseen test points right that's the real" | |
}, | |
{ | |
"start": 3804.2, | |
"text": "goal here is we want to teach our model" | |
}, | |
{ | |
"start": 3806.119, | |
"text": "something based on a lot of training" | |
}, | |
{ | |
"start": 3807.4, | |
"text": "data but then we don't want it to do" | |
}, | |
{ | |
"start": 3809.079, | |
"text": "well in the training data we want it to" | |
}, | |
{ | |
"start": 3810.4, | |
"text": "do well when we deploy it into the real" | |
}, | |
{ | |
"start": 3812.68, | |
"text": "world and it's seeing things that it has" | |
}, | |
{ | |
"start": 3814.2, | |
"text": "never seen during training so the" | |
}, | |
{ | |
"start": 3816.64, | |
"text": "concept of overfitting is exactly" | |
}, | |
{ | |
"start": 3819.319, | |
"text": "addressing that problem overfitting" | |
}, | |
{ | |
"start": 3821.48, | |
"text": "means if if your model is doing very" | |
}, | |
{ | |
"start": 3825.319, | |
"text": "well on your training data but very" | |
}, | |
{ | |
"start": 3827.0, | |
"text": "badly in testing it pro it's that means" | |
}, | |
{ | |
"start": 3830.279, | |
"text": "it's overfitting it's overfitting to the" | |
}, | |
{ | |
"start": 3832.96, | |
"text": "training data that it saw on the other" | |
}, | |
{ | |
"start": 3834.64, | |
"text": "hand there's also underfitting" | |
}, | |
{ | |
"start": 3836.319, | |
"text": "right on the left hand side you can see" | |
}, | |
{ | |
"start": 3838.44, | |
"text": "basically not fitting the data enough" | |
}, | |
{ | |
"start": 3841.48, | |
"text": "which means that you know you're going" | |
}, | |
{ | |
"start": 3842.88, | |
"text": "to achieve very similar performance on" | |
}, | |
{ | |
"start": 3844.48, | |
"text": "your testing distribution but both are" | |
}, | |
{ | |
"start": 3846.799, | |
"text": "underperforming the actual capabilities" | |
}, | |
{ | |
"start": 3849.279, | |
"text": "of your system now ideally you want to" | |
}, | |
{ | |
"start": 3851.68, | |
"text": "end up somewhere in the middle which is" | |
}, | |
{ | |
"start": 3853.88, | |
"text": "not too complex where you're memorizing" | |
}, | |
{ | |
"start": 3856.039, | |
"text": "all of the nuances in your training data" | |
}, | |
{ | |
"start": 3858.2, | |
"text": "like on the right but you still want to" | |
}, | |
{ | |
"start": 3860.48, | |
"text": "continue to perform well even based on" | |
}, | |
{ | |
"start": 3863.48, | |
"text": "the brand new data so you're not" | |
}, | |
{ | |
"start": 3864.599, | |
"text": "underfitting as well" | |
}, | |
{ | |
"start": 3866.599, | |
"text": "so to talk to actually address this" | |
}, | |
{ | |
"start": 3868.64, | |
"text": "problem in neural networks and in" | |
}, | |
{ | |
"start": 3870.2, | |
"text": "machine learning in general there's a" | |
}, | |
{ | |
"start": 3871.44, | |
"text": "few different ways that you should be" | |
}, | |
{ | |
"start": 3873.119, | |
"text": "aware of and how to do it because you'll" | |
}, | |
{ | |
"start": 3874.96, | |
"text": "need to apply them as part of your" | |
}, | |
{ | |
"start": 3877.279, | |
"text": "Solutions and your software Labs as well" | |
}, | |
{ | |
"start": 3879.72, | |
"text": "so the key concept here is called" | |
}, | |
{ | |
"start": 3881.559, | |
"text": "regularization right regularization is a" | |
}, | |
{ | |
"start": 3883.88, | |
"text": "technique that you can introduce and" | |
}, | |
{ | |
"start": 3886.559, | |
"text": "said very simply all regularization is" | |
}, | |
{ | |
"start": 3889.2, | |
"text": "is a way to discourage your model" | |
}, | |
{ | |
"start": 3893.119, | |
"text": "from from these nuances in your training" | |
}, | |
{ | |
"start": 3897.0, | |
"text": "data from being learned that's all it is" | |
}, | |
{ | |
"start": 3899.839, | |
"text": "and as we've seen before it's actually" | |
}, | |
{ | |
"start": 3901.319, | |
"text": "critical for our models to be able to" | |
}, | |
{ | |
"start": 3903.119, | |
"text": "generalize you know not just on training" | |
}, | |
{ | |
"start": 3905.319, | |
"text": "data but really what we care about is" | |
}, | |
{ | |
"start": 3907.16, | |
"text": "the testing data so the most popular" | |
}, | |
{ | |
"start": 3909.92, | |
"text": "regularization technique that's" | |
}, | |
{ | |
"start": 3911.599, | |
"text": "important for you to understand is this" | |
}, | |
{ | |
"start": 3913.799, | |
"text": "very simple idea called Dropout let's" | |
}, | |
{ | |
"start": 3916.92, | |
"text": "revisit this picture of a deep neural" | |
}, | |
{ | |
"start": 3918.559, | |
"text": "network that we've been seeing all" | |
}, | |
{ | |
"start": 3920.0, | |
"text": "lecture right in Dropout our training" | |
}, | |
{ | |
"start": 3922.799, | |
"text": "during training what we're going to do" | |
}, | |
{ | |
"start": 3924.88, | |
"text": "is randomly set some of the activations" | |
}, | |
{ | |
"start": 3927.839, | |
"text": "right these outputs of every single" | |
}, | |
{ | |
"start": 3929.799, | |
"text": "neuron to zero we're just randomly going" | |
}, | |
{ | |
"start": 3932.559, | |
"text": "to set them to zero with some" | |
}, | |
{ | |
"start": 3934.2, | |
"text": "probability right so let's say 50% is" | |
}, | |
{ | |
"start": 3937.72, | |
"text": "our probability that means that we're" | |
}, | |
{ | |
"start": 3940.0, | |
"text": "going to take all of the activation in" | |
}, | |
{ | |
"start": 3942.64, | |
"text": "our in our neural network and with a" | |
}, | |
{ | |
"start": 3944.92, | |
"text": "probability of 50% before we pass that" | |
}, | |
{ | |
"start": 3947.359, | |
"text": "activation onto the next neuron we're" | |
}, | |
{ | |
"start": 3949.4, | |
"text": "just going to set it to zero and not" | |
}, | |
{ | |
"start": 3951.88, | |
"text": "pass on anything so effectively 50% of" | |
}, | |
{ | |
"start": 3954.76, | |
"text": "the neurons are are going to be kind of" | |
}, | |
{ | |
"start": 3957.359, | |
"text": "shut down or killed in a forward pass" | |
}, | |
{ | |
"start": 3959.96, | |
"text": "and you're only going to forward pass" | |
}, | |
{ | |
"start": 3961.64, | |
"text": "information with the other 50% of your" | |
}, | |
{ | |
"start": 3964.079, | |
"text": "neurons so this idea is extremely" | |
}, | |
{ | |
"start": 3966.64, | |
"text": "powerful actually because it lowers the" | |
}, | |
{ | |
"start": 3968.599, | |
"text": "capacity of our neural network it not" | |
}, | |
{ | |
"start": 3970.64, | |
"text": "only lowers the capacity of our neural" | |
}, | |
{ | |
"start": 3972.359, | |
"text": "network but it's dynamically lowering it" | |
}, | |
{ | |
"start": 3974.599, | |
"text": "because on the next iteration we're" | |
}, | |
{ | |
"start": 3976.52, | |
"text": "going to pick a different 50% of neurons" | |
}, | |
{ | |
"start": 3978.72, | |
"text": "that we drop out so constantly the" | |
}, | |
{ | |
"start": 3980.68, | |
"text": "network is going to have to learn to" | |
}, | |
{ | |
"start": 3982.68, | |
"text": "build Pathways different pathways from" | |
}, | |
{ | |
"start": 3985.799, | |
"text": "input to output and that it can't rely" | |
}, | |
{ | |
"start": 3988.16, | |
"text": "on any small any small part of the" | |
}, | |
{ | |
"start": 3990.319, | |
"text": "features that are present in any part of" | |
}, | |
{ | |
"start": 3992.52, | |
"text": "the training data set too extensively" | |
}, | |
{ | |
"start": 3994.72, | |
"text": "right because it's constantly being" | |
}, | |
{ | |
"start": 3995.96, | |
"text": "forced to find these different Pathways" | |
}, | |
{ | |
"start": 3998.52, | |
"text": "with random" | |
}, | |
{ | |
"start": 4000.359, | |
"text": "probabilities so that's Dropout the" | |
}, | |
{ | |
"start": 4002.599, | |
"text": "second regularization technique is going" | |
}, | |
{ | |
"start": 4004.76, | |
"text": "to be this notion called early stopping" | |
}, | |
{ | |
"start": 4006.72, | |
"text": "which is actually something that is" | |
}, | |
{ | |
"start": 4008.96, | |
"text": "model agnostic you can apply this to any" | |
}, | |
{ | |
"start": 4011.039, | |
"text": "type of model as long as you have a" | |
}, | |
{ | |
"start": 4012.44, | |
"text": "testing set that you can play around" | |
}, | |
{ | |
"start": 4013.96, | |
"text": "with so the idea here" | |
}, | |
{ | |
"start": 4016.039, | |
"text": "is that we have already a pretty formal" | |
}, | |
{ | |
"start": 4019.0, | |
"text": "mathematical definition of what it means" | |
}, | |
{ | |
"start": 4021.359, | |
"text": "to overfit right overfitting is just" | |
}, | |
{ | |
"start": 4023.88, | |
"text": "when our model starts to perform worse" | |
}, | |
{ | |
"start": 4026.0, | |
"text": "on our test set that's really all it is" | |
}, | |
{ | |
"start": 4028.559, | |
"text": "right so what if we plot over the course" | |
}, | |
{ | |
"start": 4031.44, | |
"text": "of training so x-axis is as we're" | |
}, | |
{ | |
"start": 4033.16, | |
"text": "training the model let's look at the" | |
}, | |
{ | |
"start": 4035.16, | |
"text": "performance on both the training set and" | |
}, | |
{ | |
"start": 4037.24, | |
"text": "the test set so in the beginning you can" | |
}, | |
{ | |
"start": 4040.039, | |
"text": "see that the training set and the test" | |
}, | |
{ | |
"start": 4041.92, | |
"text": "set are both going down and they" | |
}, | |
{ | |
"start": 4043.839, | |
"text": "continue to go down uh which is" | |
}, | |
{ | |
"start": 4046.079, | |
"text": "excellent because it means that our" | |
}, | |
{ | |
"start": 4047.16, | |
"text": "model is getting stronger eventually" | |
}, | |
{ | |
"start": 4049.119, | |
"text": "though what you'll notice is that the" | |
}, | |
{ | |
"start": 4050.92, | |
"text": "test loss plateaus and starts to" | |
}, | |
{ | |
"start": 4054.72, | |
"text": "increase on the other hand the training" | |
}, | |
{ | |
"start": 4057.0, | |
"text": "loss there's no reason why the training" | |
}, | |
{ | |
"start": 4058.839, | |
"text": "loss should ever need to stop going down" | |
}, | |
{ | |
"start": 4061.279, | |
"text": "right training losses generally always" | |
}, | |
{ | |
"start": 4063.2, | |
"text": "continue to Decay as long as there is" | |
}, | |
{ | |
"start": 4066.599, | |
"text": "capacity in the neural network to learn" | |
}, | |
{ | |
"start": 4069.2, | |
"text": "those differences right but the" | |
}, | |
{ | |
"start": 4070.72, | |
"text": "important point is that this continues" | |
}, | |
{ | |
"start": 4073.24, | |
"text": "for the rest of training and we want to" | |
}, | |
{ | |
"start": 4075.2, | |
"text": "BAS basically we care about this point" | |
}, | |
{ | |
"start": 4077.64, | |
"text": "right here right this is the really" | |
}, | |
{ | |
"start": 4079.119, | |
"text": "important point because this is where we" | |
}, | |
{ | |
"start": 4081.76, | |
"text": "need to stop training right after this" | |
}, | |
{ | |
"start": 4083.76, | |
"text": "point this is the happy medium because" | |
}, | |
{ | |
"start": 4085.72, | |
"text": "after this point we start to overfit on" | |
}, | |
{ | |
"start": 4089.319, | |
"text": "parts of the data where our training" | |
}, | |
{ | |
"start": 4091.039, | |
"text": "accuracy becomes actually better than" | |
}, | |
{ | |
"start": 4093.2, | |
"text": "our testing accuracy so our testing" | |
}, | |
{ | |
"start": 4094.64, | |
"text": "accuracy is going bad it's getting worse" | |
}, | |
{ | |
"start": 4097.319, | |
"text": "but our training accuracy is still" | |
}, | |
{ | |
"start": 4098.719, | |
"text": "improving so it means overfitting on the" | |
}, | |
{ | |
"start": 4100.88, | |
"text": "other hand on the left hand" | |
}, | |
{ | |
"start": 4102.839, | |
"text": "side this is the opposite problem right" | |
}, | |
{ | |
"start": 4105.64, | |
"text": "we have not fully utilized the capacity" | |
}, | |
{ | |
"start": 4107.719, | |
"text": "of our model and the testing accuracy" | |
}, | |
{ | |
"start": 4109.839, | |
"text": "can still improve further right this is" | |
}, | |
{ | |
"start": 4112.48, | |
"text": "a very powerful idea but it's actually" | |
}, | |
{ | |
"start": 4114.52, | |
"text": "extremely easy to implement in practice" | |
}, | |
{ | |
"start": 4116.6, | |
"text": "because all you really have to do is" | |
}, | |
{ | |
"start": 4118.279, | |
"text": "just monitor the loss of over the course" | |
}, | |
{ | |
"start": 4120.759, | |
"text": "of training right and you just have to" | |
}, | |
{ | |
"start": 4122.199, | |
"text": "pick the model where the testing" | |
}, | |
{ | |
"start": 4123.96, | |
"text": "accuracy starts to get" | |
}, | |
{ | |
"start": 4126.64, | |
"text": "worse so I'll conclude this lecture by" | |
}, | |
{ | |
"start": 4128.92, | |
"text": "just summarizing three key points that" | |
}, | |
{ | |
"start": 4130.92, | |
"text": "we've cover covered in the class so far" | |
}, | |
{ | |
"start": 4133.319, | |
"text": "and this is a very g-pack class so the" | |
}, | |
{ | |
"start": 4136.08, | |
"text": "entire week is going to be like this and" | |
}, | |
{ | |
"start": 4138.08, | |
"text": "today is just the start so so far we've" | |
}, | |
{ | |
"start": 4140.359, | |
"text": "learned the fundamental building blocks" | |
}, | |
{ | |
"start": 4142.44, | |
"text": "of neural network starting all the way" | |
}, | |
{ | |
"start": 4144.239, | |
"text": "from just one neuron also called a" | |
}, | |
{ | |
"start": 4145.92, | |
"text": "perceptron we learned that we can stack" | |
}, | |
{ | |
"start": 4148.48, | |
"text": "these systems on top of each other to" | |
}, | |
{ | |
"start": 4151.0, | |
"text": "create a hierarchical network and how we" | |
}, | |
{ | |
"start": 4154.08, | |
"text": "can mathematically optimize those types" | |
}, | |
{ | |
"start": 4156.279, | |
"text": "of systems and then finally in the very" | |
}, | |
{ | |
"start": 4158.04, | |
"text": "very last part of the class we talked" | |
}, | |
{ | |
"start": 4159.6, | |
"text": "about just techniques tips and" | |
}, | |
{ | |
"start": 4161.719, | |
"text": "techniques for actually training and" | |
}, | |
{ | |
"start": 4163.52, | |
"text": "applying these systems into practice ice" | |
}, | |
{ | |
"start": 4166.359, | |
"text": "now in the next lecture we're going to" | |
}, | |
{ | |
"start": 4167.88, | |
"text": "hear from Ava on deep sequence modeling" | |
}, | |
{ | |
"start": 4170.759, | |
"text": "using rnns and also a really new and" | |
}, | |
{ | |
"start": 4174.52, | |
"text": "exciting algorithm and type of model" | |
}, | |
{ | |
"start": 4176.88, | |
"text": "called the Transformer which uh is built" | |
}, | |
{ | |
"start": 4180.279, | |
"text": "off of this principle of attention" | |
}, | |
{ | |
"start": 4182.239, | |
"text": "you're going to learn about it in the" | |
}, | |
{ | |
"start": 4183.4, | |
"text": "next class but let's for now just take a" | |
}, | |
{ | |
"start": 4185.679, | |
"text": "brief pause and let's resume in about" | |
}, | |
{ | |
"start": 4187.64, | |
"text": "five minutes just so we can switch" | |
}, | |
{ | |
"start": 4188.96, | |
"text": "speakers and Ava can start her" | |
}, | |
{ | |
"start": 4191.199, | |
"text": "presentation okay thank you" | |
} | |
] |