Yannael_LB commited on
Commit
3730a63
·
1 Parent(s): ba7a195
Files changed (3) hide show
  1. ErnWZxJovaM.json +468 -0
  2. app.py +124 -79
  3. utils.py +426 -0
ErnWZxJovaM.json ADDED
@@ -0,0 +1,468 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "num_chapter": 0,
4
+ "title": "Introduction to the Course and AI Advancements",
5
+ "start_paragraph_number": 0,
6
+ "end_paragraph_number": 17,
7
+ "start_time": 8,
8
+ "end_time": 467,
9
+ "paragraphs": [
10
+ "Good afternoon, everyone, and welcome to MIT 6.S191. My name is Alexander Amini, and I'll be one of your instructors for the course this year, along with Ava. We're really excited to welcome you to this incredible course.",
11
+ "This is a fast-paced and intense one-week course that we're about to go through together. We'll be covering the foundations of a rapidly changing field, and a field that has been revolutionizing many areas of science, mathematics, physics, and more.",
12
+ "Over the past decade, AI and deep learning have been rapidly advancing and solving problems that we didn't think were solvable in our lifetimes. Today, AI is solving problems beyond human performance, and each year, this lecture is getting harder and harder to teach because it's supposed to cover the foundations of the field.",
13
+ "If you think about any other introductory course, like a 101 course on mathematics or biology, the first lecture doesn't really change much over time. But we're in a rapidly changing field of AI and deep learning where even these types of lectures are rapidly changing.",
14
+ "Let me give you an example of how we introduced this course only a few years ago: \"Hi, everybody, and welcome to MIT 6.S191, the official introductory course on deep learning, taught here at MIT. Deep learning is revolutionizing many fields, from robotics to medicine and everything in between. You'll learn the fundamentals of this field and how you can build these incredible algorithms.\"",
15
+ "In fact, this entire speech and video are not real and were created using deep learning and artificial intelligence.",
16
+ "I'm excited to share my knowledge with you as we explore the world of artificial intelligence and deep learning together. We'll be covering a lot of ground in this course, and I'm looking forward to it.",
17
+ "Artificial intelligence and deep learning have come a long way in this class. It's been an honor to speak with you today, and I hope you enjoy the course.",
18
+ "The really surprising thing about that video to me was how viral it went a few years ago. When we first did it, it was shocking to see how quickly it spread. Within a couple of months, the video got over a million views. People were amazed by the realism of AI, which could generate content that looked and sounded extremely hyperrealistic.",
19
+ "When we created that video for the class a few years ago, it took us about $10,000 and a lot of compute to generate just about a minute-long video. That's extremely expensive to compute something that looks like that. Maybe some of you aren't even impressed by the technology today because you see all the amazing things that AI and deep learning are producing now.",
20
+ "Fast-forward to today, and the progress in deep learning is incredible. People were making all kinds of exciting remarks about it when it came out a few years ago. Now, this is common stuff because AI is really doing much more powerful things than this fun little introductory video.",
21
+ "So, today, fast-forward four years, and we're in a completely different place. AI is now generating content with deep learning being so commoditized. Deep learning is in all of our fingertips now, online in our smartphones, and so on. In fact, we can use deep learning to generate these types of hyperrealistic pieces of media and content entirely from English language without even coding anymore.",
22
+ "Before, we had to actually go in and train these models and really code them to be able to create that one-minute-long video. Today, we have models that will do that for us end-to-end directly from English language. We can use these models to create something that the world has never seen before.",
23
+ "Here's the improved version of the text:",
24
+ "'Imagine a photo of an astronaut riding a horse. These models can create those pieces of content entirely from scratch. My personal favorite is actually how we can now ask these deep learning models to create new types of software, even software that can ask them to create something. For example, we can ask a neural network to write TensorFlow code to train another neural network. Our model can produce examples of functional and usable pieces of code that satisfy this English prompt, while walking through each part of the code independently, so not just producing it but actually educating and teaching the user on what each part of these code blocks are actually doing.",
25
+ "You can see examples here. What I'm trying to show you with all of this is that this is just highlighting how far deep learning has gone, even in just a couple of years since we've started teaching this course. Going back even from before that, to eight years ago, the most amazing thing that you'll see in this course, in my opinion, is that what we try to do here is to teach you the foundations of all of this, how all of these different types of models are created from the ground up, and how we can make all of these amazing advances possible, so that you can also do it on your own as well.",
26
+ "As I mentioned at the beginning, this introduction course is becoming increasingly challenging every year, and I'm unsure where the field will be headed next year. Honestly, it's difficult to predict, given how rapidly it's evolving. However, I do know that what we'll share with you in this course, over the next week, will provide the foundations of all the tech technologies we've seen so far, enabling you to create your own future and design brand new deep learning models using those fundamentals."
27
+ ],
28
+ "paragraph_timestamps": [
29
+ 8,
30
+ 21,
31
+ 65,
32
+ 94,
33
+ 113,
34
+ 144,
35
+ 457,
36
+ 160,
37
+ 174,
38
+ 215,
39
+ 236,
40
+ 253,
41
+ 284,
42
+ 300,
43
+ 302,
44
+ 353,
45
+ 386
46
+ ]
47
+ },
48
+ {
49
+ "num_chapter": 1,
50
+ "title": "Foundations of Intelligence and Deep Learning",
51
+ "start_paragraph_number": 17,
52
+ "end_paragraph_number": 24,
53
+ "start_time": 467,
54
+ "end_time": 567,
55
+ "paragraphs": [
56
+ "Let's get started by exploring the foundations from the very beginning and asking ourselves what intelligence is at its core \u2013 not just artificial intelligence, but intelligence itself. In my opinion, intelligence is the ability to process information, which informs our future decision-making abilities. This is something we humans do every day.",
57
+ "Artificial intelligence is simply the ability to give computers the same capacity to process information and inform future decisions. Machine learning is a subset of artificial intelligence, and it's the science of teaching computers how to process information and make decisions from data, rather than hardcoding rules into machines.",
58
+ "Deep learning is a subset of machine learning that uses neural networks to process raw, unprocessed data and ingest large datasets.",
59
+ "'Let's start by tackling the foundations from the beginning and exploring what intelligence is at its core. We'll examine how this concept of deep learning relates to other scientific concepts you've learned about so far.'",
60
+ "'Now, let's break down the key components: artificial intelligence, machine learning, and deep learning. Artificial intelligence is the ability to give computers the capacity to process information and inform future decisions. Machine learning is the science of teaching computers how to process information and make decisions from data. Deep learning is a subset of machine learning that uses neural networks to process raw data and ingest large datasets.'",
61
+ "'I'd like to emphasize that intelligence is the ability to process information, which informs our future decision-making abilities. This is something we humans do every day. Artificial intelligence is simply the ability to give computers the same capacity to process information and inform future decisions.'",
62
+ "Now, let's dive deeper into the concept of deep learning and how it relates to other scientific concepts you've learned about so far."
63
+ ],
64
+ "paragraph_timestamps": [
65
+ 467,
66
+ 487,
67
+ 537,
68
+ 457,
69
+ 487,
70
+ 476,
71
+ 558
72
+ ]
73
+ },
74
+ {
75
+ "num_chapter": 2,
76
+ "title": "Course Overview and Structure",
77
+ "start_paragraph_number": 24,
78
+ "end_paragraph_number": 37,
79
+ "start_time": 567,
80
+ "end_time": 833,
81
+ "paragraphs": [
82
+ "That's exactly what this class is all about: teaching machines how to process data, process information, and inform decision-making abilities from that data, and learn from that data. This program is split between two different parts, so you should think of this class as being comprised of both technical lectures and software labs.",
83
+ "We'll have several new updates this year, covering the rapid changing advances in AI, especially in some of the later lectures. The first lecture today will cover the foundations of neural networks themselves, starting with the building blocks of every single neural network, which is called the perceptron. We'll conclude with a series of exciting guest lectures from industry-leading sponsors of the course.",
84
+ "On the software side, after every lecture, you'll also get software experience and project-building experience to be able to take what we teach in lectures and actually deploy them in real code and produce something based on the learnings that you find in this lecture. At the very end of the class, from the software side, you'll have the ability to participate in a really fun day, which is the project pitch competition - it's kind of like a shark tank-style competition of all of the different projects from all of you, and win some really awesome prizes.",
85
+ "Here is the rewritten syllabus with improved readability:",
86
+ "The syllabus for this lecture is as follows:",
87
+ "Each day, we will have dedicated software labs that mirror the technical lectures, helping you reinforce your learnings. These labs are coupled with prizes for the top-performing software solutions.",
88
+ "The course will start with Lab One, which focuses on music generation. You will learn how to build a neural network that can learn from musical songs and then compose brand new songs in the same genre.",
89
+ "Tomorrow, Lab Two will cover computer vision, where you will learn about facial detection systems and build one from scratch using convolutional neural networks. You will also learn how to debias these systems, which is a significant problem in state-of-the-art solutions today.",
90
+ "Finally, a new Lab at the end of the course will focus on large language models. You will fine-tune a billion-parameter large language model to build an assistive chatbot and evaluate its cognitive abilities, including mathematics, scientific reasoning, and logical abilities.",
91
+ "At the end of the course, there will be a final project pitch competition, where each team will have up to 5 minutes to present their project. All of these labs are accompanied by great prizes, so there will be plenty of fun throughout the week.",
92
+ "There are many resources available to help with this class, which can be found on the course slides. If you have any questions, please post them on Piazza. The teaching team is also available to answer any questions you may have.",
93
+ "Myself and AA will be the main lecturers for this course, especially on Monday through Wednesday. We will also have some amazing guest lectures on the second half of the course, which cover the latest advancements in deep learning, particularly in industry outside of academia.",
94
+ "I'd like to take a moment to express my gratitude to all of our sponsors, without whom this course, like every year, would not be possible."
95
+ ],
96
+ "paragraph_timestamps": [
97
+ 567,
98
+ 598,
99
+ 623,
100
+ 661,
101
+ 661,
102
+ 664,
103
+ 684,
104
+ 698,
105
+ 727,
106
+ 748,
107
+ 768,
108
+ 781,
109
+ 807
110
+ ]
111
+ },
112
+ {
113
+ "num_chapter": 3,
114
+ "title": "Introduction to Deep Learning",
115
+ "start_paragraph_number": 37,
116
+ "end_paragraph_number": 46,
117
+ "start_time": 833,
118
+ "end_time": 1044,
119
+ "paragraphs": [
120
+ "Now, let's dive into the technical aspects of the course, which is my favorite part. To understand why we care about deep learning, I think we need to go back a bit and understand how machine learning used to be performed.",
121
+ "Typically, machine learning would involve defining a set of features, which are essentially a set of things to look for in an image or piece of data. These features are usually hand-engineered by humans, but the problem with this approach is that they tend to be very brittle in practice.",
122
+ "The key idea of deep learning is to move away from hand-engineering features and rules, and instead, try to learn them directly from raw pieces of data. So, what are the patterns that we need to look at in data sets, such that if we look at those patterns, we can make some interesting decisions and take interesting actions?",
123
+ "For example, if we wanted to learn how to detect faces, we might look for certain patterns in a picture. What are we looking for to detect a face? We're looking for eyes, noses, and ears, and when those things are all composed in a certain way, we would probably deduce that it's a face. Computers do something very similar, so they have to understand what the patterns are that they look for, what the eyes and noses and ears of those pieces of data are, and then they can make decisions based on those patterns.",
124
+ "Computers do something very similar, so they have to understand what the patterns are that they look for, what the eyes and noses and ears of those pieces of data are, and then they can make decisions based on those patterns. They have to understand what the patterns are that they look for, what the eyes and noses and ears of those pieces of data are, and then they can make decisions based on those patterns.",
125
+ "The really interesting thing I think about deep learning is that these foundations for doing exactly what I just mentioned - picking out the building blocks, picking out the features from raw pieces of data and the underlying algorithms themselves - have existed for many, many decades now. The question I would ask at this point is, so why are we studying this now and why is all of this really blowing up right now and exploding with so many great advances?",
126
+ "Well, for one, there are three things. Number one is that the data that is available to us today is significantly more pervasive. These models are hungry for data, and you're going to learn about this more in detail, but these models are extremely hungry for data. And we're living in a world right now, quite frankly, where data is more abundant than it has ever been in our history.",
127
+ "Secondly, these algorithms are massively compute-hungry, and they're massively parallelizable, which means that they have greatly benefited from compute hardware, which is also capable of being parallelized. The particular name of that hardware is called a GPU. GPUs can run parallel processing streams of information and are particularly amenable to deep learning algorithms. And the abundance of GPUs and that compute hardware has also pushed forward what we can do in deep learning.",
128
+ "Finally, the last piece is the software. It's the open-source tools that are really used as the foundational building blocks of deploying and building all of these underlying models that you're going to learn about in this course. And those open-source tools have just become extremely streamlined, making this extremely easy for all of us to learn about these technologies within an amazing one-course like this."
129
+ ],
130
+ "paragraph_timestamps": [
131
+ 833,
132
+ 848,
133
+ 871,
134
+ 898,
135
+ 911,
136
+ 928,
137
+ 958,
138
+ 981,
139
+ 1014
140
+ ]
141
+ },
142
+ {
143
+ "num_chapter": 4,
144
+ "title": "Introduction to Perceptrons",
145
+ "start_paragraph_number": 46,
146
+ "end_paragraph_number": 66,
147
+ "start_time": 1044,
148
+ "end_time": 1747,
149
+ "paragraphs": [
150
+ "So, let's start now with understanding exactly what is the fundamental building block of a neural network. That building block is called a perceptron. Every single perceptron, every single neural network is built up of multiple perceptrons, and you're going to learn how those perceptrons number one compute information themselves and how they connect to these much larger billion-parameter neural networks. The key idea of a perceptron, or even simpler, think of this as a single neuron. So, a neural network is composed of many, many neurons, and a perceptron is just one neuron. That idea of a perceptron is actually extremely simple, and I hope that by the end of today, this idea and this processing of a perceptron becomes extremely clear to you.",
151
+ "So, let's start by talking about just the forward propagation of information through a single neuron. Single neurons ingest information; they can actually ingest multiple pieces of information. So, here you can see this neuron taking in three pieces of information: X1, X2, and X(M). We define the set of inputs as x1 through M, and each of these inputs, each of these numbers, is going to be element-wise multiplied by a particular weight. So, this is going to be denoted here by W1 through WM, so this is a corresponding weight for every single input. You should think of this as really, you know, every weight being assigned to that input. The weights are part of the neuron itself.",
152
+ "Now, you multiply all of these inputs with their weights together and then you add them up. We take this single number after that addition and pass it through what's called a nonlinear activation function to produce your final output, which here, we'll be calling y.",
153
+ "Now, what I just said is not entirely correct, right? So, I missed out one critical piece of information. That piece of information is that we also have what you can see here, called the bias term. That bias term is actually what allows your neuron to shift its activation function horizontally on that x-axis, if you think of it. So, on the right side, you can now see this diagram illustrating that.",
154
+ "'Here, you can see the single equation that I talked through conceptually right now, mathematically written down as one single equation. We can actually rewrite this using linear algebra, using vectors and dot products. So, let's do that. Right now, our inputs are going to be described by a capital X, which is simply a vector of all of our inputs X1 through XM. And then our weights are going to be described by a capital W, which is going to be W1 through WM.'",
155
+ "'The input is obtained by taking the dot product of X and W. That dot product does an element-wise multiplication and then adds up all the element-wise multiplications. And then here's the missing piece: we're now going to add that bias term. We're calling the bias term W0, right? And then we're going to apply the nonlinearity, which here is denoted as Z or G, excuse me.'",
156
+ "'I've mentioned this nonlinearity a few times, this activation function. Let's dig into it a little bit more, so we can understand what is actually this activation function doing. Well, I said a couple of things about it. I said it's a nonlinear function, right? Here, you can see one example of an activation function, one common, uh, one commonly used activation function is called the sigmoid function, which you can actually see here on the bottom right-hand side of the screen.'",
157
+ "'The sigmoid function is very commonly used because it's outputs, right? It takes as input any real number, the x-axis is infinite, plus or minus, but on the Y-axis, it basically squashes every input X into a number between 0 and 1. So, it's actually a very common choice for things like probability distributions if you want to convert your answers into probabilities or learn or teach a neuron to learn a probability distribution.'",
158
+ "But in fact, there are many different types of nonlinear activation functions used in neural networks. Here are some common ones. Again, throughout this presentation, you'll see these little TensorFlow icons. These icons are used throughout the entire course to relate the foundational knowledge taught in lectures to the software labs. This might provide a good starting point for many of the pieces you'll need to do later on in the software parts of the class.",
159
+ "The sigmoid activation function, which we discussed in the last slide, is shown on the left-hand side. This is very popular because of its probability distributions. It squashes everything between zero and one. You'll also see two other very common types of activation functions in the middle and right-hand side: the tanh activation function and the relu activation function, also known as the rectified linear unit.",
160
+ "The relu activation function is very easy to compute and still has the nonlinearity we need. We'll talk about why we need it in a moment. It's very fast, just two linear functions piecewise combined with each other.",
161
+ "Now, let's talk about why we need a nonlinearity in the first place. Why not just deal with a linear function that we pass all of these inputs through? The point of the activation function is to introduce nonlinearity itself. What we want to do is allow our neural network to deal with nonlinear data. Our neural networks need the ability to deal with nonlinear data because the world is extremely nonlinear. This is important because, if you think of real-world datasets, this is just the way they are. If you look at datasets like this one, with green and red points, and I ask you to build a neural network to classify them, you'll see that the data is nonlinear. What can separate the green and the red points? This means that we actually need a nonlinear function to do that. We cannot solve this problem with a single line; in fact, if we used linear functions as our activation function, no matter how big our neural network is, it's still a linear function because linear functions combined with linear functions are still linear. So, no matter how deep or how many parameters your neural network has, the best it would be able to do to separate these green and red points would look like this.",
162
+ "But adding nonlinearity allows our neural networks to be smaller by allowing them to be more expressive and capture more complexities in the datasets, and this allows them to be much more powerful in the end.",
163
+ "So, let's understand this with a simple example. Imagine I give you a trained neural network. What does it mean, trained neural network? It means I'm giving you the weights, right, not only the inputs but I'm going to tell you what the weights of this neural network are. So, here, let's say the bias term w0 is going to be one, and our W vector is going to be 3 and ne2, right? These are just the weights of your trained neural network. Let's worry about how we got those weights in a second, but this network has two inputs, X1 and X2.",
164
+ "Now, if we want to get the output of this neural network, all we have to do is simply do the same story that we talked about before: it's a dot product, inputs with weights, add the bias, and apply the nonlinearity. Right, and those are the three components that you really have to remember as part of this class: dot product, add the bias, and apply a nonlinearity. That's going to be the process that keeps repeating over and over and over again for every single neuron after that happens, and that neuron is going to output a single number.",
165
+ "Let's take a look at what's inside that nonlinearity. It's simply a weighted combination of those inputs with those weights. So, what's inside G, right inside G, is a weighted combination of X and W, added with a bias, which is going to produce a single number. But in reality, for any input that this model could see, what this really is is a two-dimensional line because we have two parameters in this model. So, we can actually plot that line, and we can see exactly how this neuron separates points on these axes between X1 and X2. These are the two inputs of this model. We can see exactly and interpret exactly what this neuron is doing. We can visualize its entire space because we can plot the line that defines this neuron.",
166
+ "So, here we're plotting when that line equals zero. And in fact, if I give you that neuron a new data point \u2013 the new data point is X1 = -1 and X2 = 2, just an arbitrary point in this two-dimensional space \u2013 we can plot that point in the two-dimensional space. And depending on which side of the line it falls on, it tells us what the answer is going to be, what the sign of the answer is going to be, and also what the answer itself is going to be.",
167
+ "So, if we follow that equation written on the top here and plug in -1 and 2, we're going to get 1 - 3 - 4, which equals -6. Right, and when I put that into my nonlinearity G, I'm going to get a final output of 0.2. But don't worry about the final output \u2013 that's just going to be the output for that signal function. The important point to remember here is that the sigmoid function actually divides the space into these two parts. It squashes everything between Z and one, but it divides it implicitly by everything less than 0.5 and greater than 0.5, depending on if it's on the left side or the right side of the line.",
168
+ "So, depending on which side of the line you fall on \u2013 remember, the line is when X equals Z \u2013 the input to the sigmoid is zero. If you fall on the left side of the line, your output will be less than 0.5, because you're falling on the negative side of Z. The line, if your output is if your input is on the right side of the line, now your output is going to be greater than 0.5. Right, so here we can actually visualize this space \u2013 this is called the feature space of a neural network \u2013 we can visualize it in its entirety, right? We can totally visualize and interpret this neural network, and we can understand exactly what it's going to do for any input that it sees, right?",
169
+ "But of course, this is a very simple neuron, right? It's not a neural network; it's just one neuron, and even more than that, it's even a very simple neuron \u2013 it only has two inputs, right? So, in reality, the types of neurons that you're going to be dealing with in this course are going to be neurons and neural networks with millions or even billions of these parameters, right? So, here we only have two weights, W1 and W2, but today's neural networks have billions of these parameters. So, drawing these types of plots that you see here obviously becomes a lot more challenging \u2013 it's actually not possible."
170
+ ],
171
+ "paragraph_timestamps": [
172
+ 1044,
173
+ 1093,
174
+ 1139,
175
+ 1156,
176
+ 1184,
177
+ 1213,
178
+ 1240,
179
+ 1266,
180
+ 1294,
181
+ 1329,
182
+ 1367,
183
+ 1379,
184
+ 1454,
185
+ 1470,
186
+ 1501,
187
+ 1538,
188
+ 1593,
189
+ 1621,
190
+ 1669,
191
+ 1710
192
+ ]
193
+ },
194
+ {
195
+ "num_chapter": 5,
196
+ "title": "Building Neural Networks",
197
+ "start_paragraph_number": 66,
198
+ "end_paragraph_number": 71,
199
+ "start_time": 1747,
200
+ "end_time": 1874,
201
+ "paragraphs": [
202
+ "But now that we have some of the intuition behind a perceptron, let's start now by building neural networks and seeing how all of this comes together. So, let's revisit that previous diagram of a perceptron. Now, again, if there's only one thing to take away from this lecture, right now, it's to remember how a perceptron works \u2013 that equation of a perceptron is extremely important for every single class that comes after today, and there are only three steps: dot product with the inputs, add a bias, and apply your nonlinearity.",
203
+ "Let's simplify the diagram a little bit. I'll remove the weight labels from this picture, and now you can assume that if I show a line, every single line has an associated weight that comes with that line, right? I'll also remove the bias term for simplicity \u2013 assume that every neuron has that bias term; I don't need to show it.",
204
+ "Now, note that the result here, now calling it Z, which is just the dot product plus bias before the nonlinearity, is the output. The output is going to be linear \u2013 it's just a weighted sum of all those pieces we have, not applied the nonlinearity yet. But our final output is just going to be G of Z, it's the activation function or non-linear activation function applied to Z.",
205
+ "Now, if we want to step this up a little bit more and say what if we had a multi-output function? Now we don't just have one output, but let's say we want to have two outputs. Well, now we can just have two neurons in this network, right? Every neuron sees all of the inputs that came before it, but now you see the top neuron is going to be predicting an answer and the bottom neuron will predict its own answer.",
206
+ "Now, importantly, one thing you should really notice here is that each neuron has its own weights, right? Each neuron has its own lines that are coming into just that neuron, right? So they're acting independently, but they can later on communicate if you have another layer, right?"
207
+ ],
208
+ "paragraph_timestamps": [
209
+ 1747,
210
+ 1781,
211
+ 1802,
212
+ 1828,
213
+ 1856
214
+ ]
215
+ },
216
+ {
217
+ "num_chapter": 6,
218
+ "title": "Building a Neural Network from Scratch",
219
+ "start_paragraph_number": 71,
220
+ "end_paragraph_number": 84,
221
+ "start_time": 1874,
222
+ "end_time": 2291,
223
+ "paragraphs": [
224
+ "So, let's start now by initializing this process a bit further and thinking about it more programmatically, right? What if we wanted to program this neural network ourselves from scratch, right? Remember that equation I told you \u2013 it didn't sound very complex. It's take a dot product, add a bias, which is a single number, and apply nonlinearity. Let's see how we would actually implement something like that.",
225
+ "So, to define the layer, we're going to call this a layer, which is a collection of neurons. We have to first define how that information propagates through the network, so we can do that by creating a call function here. First, we're going to actually define the weights for that network. So, remember, every network, every neuron \u2013 I should say, every neuron has weights and a bias. So, let's define those first.",
226
+ "We're going to create the call function to actually see how we can pass information through that layer. So, this is going to take in input and inputs, right? This is like what we previously called X, and it's the same. We're going to matrix multiply or take a dot product of our inputs with our weights, we're going to add a bias, and then we're going to apply a nonlinearity. It's really that simple. We've now created a single-layer neural network.",
227
+ "So, this line in particular, this is the part that allows us to be a powerful neural network, maintaining that nonlinearity. The important thing here is to note that modern deep learning toolboxes and libraries already implement a lot of these for you. So, it's important for you to understand the foundations, but in practice, all of that layer architecture and all that layer logic is actually implemented in tools like TensorFlow and PyTorch through a dense layer.",
228
+ "Here, you can see an example of calling or creating initializing a dense layer with two neurons, allowing it to feed in an arbitrary set of inputs. Here, we're seeing these two neurons in a layer being fed three inputs, right, and in code, it's only reduced down to this one line of TensorFlow code, making it extremely easy and convenient for us to use these functions and call them.",
229
+ "So, now let's look at our single-layered neural network. This is where we have now one layer between our input and our outputs, right? So, we're slowly and progressively increasing the complexity of our neural network, so that we can build up all of these building blocks. This layer in the middle is called a hidden layer, right, obviously because you don't directly observe it, you don't directly supervise it, right? You do observe the two input and output layers, but your hidden layer is just kind of a \u2013 a neuron layer that you don't directly observe, right? It just gives your network more capacity, more learning complexity.",
230
+ "Since we now have a transformation function from inputs to hidden layers and hidden layers to output, we now have a two-layered neural network, which means that we also have two weight matrices. We don't just have the W1 which we previously had to create this hidden layer, but now we also have W2, which does the transformation from the hidden layer to the output layer. Yes, what happens is nonlinearity in the hidden layer has just linear, so there's no perceptron or not. Yes, so every hidden layer also has a nonlinearity accompanied with it. That's a very important point because if you don't have that, it's just a very large linear function followed by a final nonlinearity at the very end. You need that cascading and overlapping application of nonlinearities that occur throughout the network. Awesome! Okay, so now let's zoom in and look at a single unit in the hidden layer. Take this one for example, let's call it Z2. It's the second neuron in the first layer. We compute its answer by taking a dot product of its weights with its inputs, adding a bias, and then applying a nonlinearity. If we took a different hidden node, like Z3, the one right below it, we would compute its answer exactly the same way that we computed Z2, except its weights would be different than the weights of Z2. Everything else stays exactly the same; it sees the same inputs.",
231
+ "Now, this picture is getting a little bit messy, so let's clean things up a little bit more. I'm going to remove all the lines and replace them with these boxes, these symbols that will denote what we call a fully connected layer. Right, so these layers now denote that everything in our input is connected to everything in our output, and the transformation is exactly as we saw before: dot product, bias, and nonlinearity.",
232
+ "And again, in code, to do this is extremely straightforward with the foundations that we've built up from the beginning of the class. We can now just define two of these dense layers: our hidden layer on line one with n hidden units, and then our output layer with two hidden output units.",
233
+ "Does that mean the nonlinearity function must be the same between layers? No, the nonlinearity function does not need to be the same through each layer. Often, it is because of convenience, but there are some cases where you would want it to be different as well, especially in lecture two, you're going to see nonlinearities be different even within the same layer, um, let alone different layers. But unless for a particular reason, generally, convention is there's no need to keep them differently.",
234
+ "Now, let's keep expanding our knowledge a little bit more. If we now want to make a deep neural network, not just a neural network like we saw in the previous slide, now it's deep. All that means is that we're now going to stack these layers on top of each other, one by one, more and more, creating a hierarchical model. Right, the ones where the final output is now going to be computed by going deeper and deeper and deeper into the neural network.",
235
+ "And again, doing this in code again follows the exact same story as before, just cascading these TensorFlow layers on top of each other and just going deeper into the network.",
236
+ "Okay, so now this is great because now we have at least a solid foundational understanding of how to not only define a single neuron but how to define an entire neural network, and you should be able to actually explain at this point or understand how information goes from input through an entire neural network to compute an output."
237
+ ],
238
+ "paragraph_timestamps": [
239
+ 1874,
240
+ 1899,
241
+ 1924,
242
+ 1964,
243
+ 1996,
244
+ 2024,
245
+ 2063,
246
+ 2159,
247
+ 2182,
248
+ 2201,
249
+ 2232,
250
+ 2259,
251
+ 2270
252
+ ]
253
+ },
254
+ {
255
+ "num_chapter": 7,
256
+ "title": "Applying Neural Networks to Real Problems",
257
+ "start_paragraph_number": 84,
258
+ "end_paragraph_number": 90,
259
+ "start_time": 2291,
260
+ "end_time": 2451,
261
+ "paragraphs": [
262
+ "So now let's look at how we can apply these neural networks to solve a very real problem that, I'm sure, all of you care about. Here's a problem on how we want to build an AI system to learn to answer the following question: will I pass this class? I'm sure all of you are really worried about this question.",
263
+ "Um, so to do this, let's start with a simple input feature model. The feature\u2014the two features that let's concern ourselves with are going to be number one, how many lectures you attend, and\u2014",
264
+ "Number two: How many hours do you spend on your final project? So, let's look at some of the past years of this class. Right, we can actually observe how different people have lived in this space, right, between how many lectures and how much time you spend on your final project. And you can actually see every point is a person, the color of that point is going to be if they passed or failed the class, and you can see and visualize kind of this V-shaped feature space, if you will, that we talked about before.",
265
+ "Then, we have you, you fall right here, you're the point 45, uh, right in between the feature space. You've attended four lectures, and you will spend 5 hours on the final project. And you want to build a neural network to determine, given everyone else in the class, right, that I've seen from all of the previous years, you want to help you want to have your neural network help you to understand what is your likelihood that you will pass or fail this class.",
266
+ "So, let's do it. We now have all of the building blocks to solve this problem using a neural network. Let's do it. So, we have two inputs: those inputs are the number of lectures you attend and the number of hours you spend on your final project. It's four and five. We can pass those two inputs to our two variables, X1 and X2. These are fed into this single-layered, single-hidden-layered neural network, which has three hidden units in the middle. And we can see that the final predicted output probability for you to pass this class is 0.1, or 10%. Right, so very bleak outcome. It's not a good outcome. Um, the actual probability is one, right? So, attending four out of the five lectures and spending 5 hours in your final project, you actually lived in a part of the feature space which was actually very positive, right? It looked like you were going to pass the class.",
267
+ "So, what happened here? Anyone have any ideas? Why did the neural network get this so terribly wrong? Right, it's not trained exactly. So, this neural network is not trained. We haven't shown it any data yet, so it's just making random guesses."
268
+ ],
269
+ "paragraph_timestamps": [
270
+ 2291,
271
+ 2310,
272
+ 2323,
273
+ 2356,
274
+ 2387,
275
+ 2437
276
+ ]
277
+ },
278
+ {
279
+ "num_chapter": 8,
280
+ "title": "Introduction to Neural Networks",
281
+ "start_paragraph_number": 90,
282
+ "end_paragraph_number": 106,
283
+ "start_time": 2451,
284
+ "end_time": 2872,
285
+ "paragraphs": [
286
+ "Let's break down the concept of neural networks like this:",
287
+ "' Neural networks are like babies, right? Before they see any data, they haven't learned anything. There's no expectation that they should be able to solve any problems before we teach them something about the world. So, let's teach this neural network something about the problem first.",
288
+ "To train it, we first need to tell our neural network when it's making bad decisions. We need to teach it, really train it, to learn exactly like how we as humans learn. We have to inform the neural network when it gets the answer incorrect, so it can learn how to get the answer correct.",
289
+ "The closer the answer is to the ground truth, the smaller the loss should be, and the more accurate the model should be. For example, if the actual value for passing a class is 100%, but the neural network predicted a probability of 0.1, we compute what's called a loss.",
290
+ "Assuming we have data from many students who have taken the class before, we can plug all of them into the neural network and show them to the system. We care not only about how the neural network did on just one prediction but also about how it predicted on all of these different people it has shown in the past.",
291
+ "During the training and learning process, we want to find a network that minimizes the empirical loss between our predictions and those ground truth outputs, and we're going to do this on average across all of the different inputs the model has seen.",
292
+ "In the case of binary classification, where the output is a zero or one probability, we can use the sigmoid function to output a probability between 0 and 1. However, when it comes to multi-class classification or regression, we need to use a different function. The softmax function, also known as the softmax cross-entropy function, is used to output a probability distribution over multiple classes. This function is used to determine how well the neural network is performing, and the cross-entropy loss function is used to measure the difference between the predicted and true probability distributions.",
293
+ "In the case of regression, where we want to predict a real-valued output, we can use a different loss function. For example, we can use the mean squared error (MSE) loss function to measure the difference between the predicted and true values. This loss function is calculated by computing the difference between the predicted and true values, squaring it, and then taking the average.",
294
+ "Now, let's put all of this loss information together with the problem of training our neural network. We know that we want to find a neural network that will solve this problem on average, over all the data. This means that we're trying to find the optimal weights for our neural network, which we can represent as a large vector W.",
295
+ "To compute this Vector W, we need to find the optimal weights based on all of the data that we have seen so far. The vector W is also going to determine what is the loss, which is the deviation from the ground truth of our network, based on where it should be.",
296
+ "Remember that W is just a group of a bunch of numbers - a very big list of numbers, a list of weights for every single layer and every single neuron in our neural network. We want to find that Vector W based on a lot of data, which is the problem of training a neural network.",
297
+ "Our loss function is just a simple function of our weights. If we have only two weights in our neural network, like we saw earlier in the slide, then we can plot the loss landscape over this two-dimensional space. We have two weights W1 and W2, and for every single configuration or setting of those two weights, our loss will have a particular value, which is the height of this graph.",
298
+ "What we want to do is find the lowest point, where the loss is as good as possible. So, the smaller the loss, the better. We want to find the lowest point in this graph.",
299
+ "To do that, we start somewhere in this space, and we don't know where to start, so let's pick a random place to start. From that place, we compute what's called the gradient of the landscape at that particular point. This is a very local estimate of where the slope is increasing, basically where the slope is increasing at my current location.",
300
+ "That informs us not only where the slope is increasing, but more importantly, where the slope is decreasing. If I negate the direction, if I go in the opposite direction, I will move towards the lowest point in the graph. I can actually step down into the landscape and change my weights such that I lower my loss. Let's take a small step just a small step in the opposite direction of the part that's going up. Let's take a small step going down, and we'll keep repeating this process. We'll compute a new gradient at that new point, and then we'll take another small step, and we'll keep doing this over and over and over again until we converge at what's called a local minimum.",
301
+ "Right, so based on where we started, it may not be a global minimum of everywhere in this lost landscape, but let's find ourselves now in a local minimum and we're guaranteed to actually converge by following this very simple algorithm at a local minimum."
302
+ ],
303
+ "paragraph_timestamps": [
304
+ 2451,
305
+ 2453,
306
+ 2470,
307
+ 2495,
308
+ 2519,
309
+ 2539,
310
+ 2560,
311
+ 2625,
312
+ 2675,
313
+ 2703,
314
+ 2727,
315
+ 2749,
316
+ 2788,
317
+ 2798,
318
+ 2823,
319
+ 2857
320
+ ]
321
+ },
322
+ {
323
+ "num_chapter": 9,
324
+ "title": "Gradient Descent Algorithm",
325
+ "start_paragraph_number": 106,
326
+ "end_paragraph_number": 112,
327
+ "start_time": 2872,
328
+ "end_time": 3192,
329
+ "paragraphs": [
330
+ "So, let's summarize now this algorithm. This algorithm is called gradient descent. Let's summarize it first in pseudo code, and then we'll look at it in actual code in a second. So, there's a few steps. First, we initialize our location somewhere randomly in this weight space. Right, we compute the gradient of our loss at with respect to our weights. Okay, and then we take a small step in the opposite direction, and we keep repeating this in a loop over and over and over again. And we say we keep doing this until convergence, right, until we stop moving basically, and our Network basically finds where it's supposed to end up.",
331
+ "We'll discuss this small step, right? So, we're multiplying our gradient by a small step. We'll discuss that more in the later part of this lecture, but for now, let's also quickly show the analogous part in code as well, and it mirrors very nicely, right? So, we'll randomly initialize our weight, which happens every time you train a neural network. You have to randomly initialize the weights, and then you have a loop, right here showing it without even convergence, right? We're just going to keep looping forever, where we say, okay, we're going to compute the loss at that location, compute the gradient, update the weights, and then we'll do it again.",
332
+ "So, which way is up, and then we just negate that gradient, multiply it by some what's called the learning rate (LR), denoted here, it's a small step, and then we take a direction in that small step. Let's take a deeper look at this term here. This is called the gradient, right? This tells us which way is up in that landscape, and this again tells us even more than that. It tells us how our landscape, how our loss, is changing as a function of all of our weights. But I actually haven't told you how to compute this, so let's talk about that process. That process is called backpropagation. We'll go through this very briefly, and we'll start with the simplest neural network, uh, that's possible, right? So, we already saw the simplest building block, which is a single neuron. Now, let's build the simplest neural network, which is just a one-neuron neural network, right? So, it has one hidden neuron, it goes from input to hidden neuron to output, and we want to compute the gradient of our loss with respect to this weight W2, okay? So, I'm highlighting it here. We have two weights, let's compute the gradient first with respect to W2, and that tells us how much does a small change in W2 affect our loss? Does our loss go up or down if we move our W2 a little bit in one direction or another?",
333
+ "Let's write out this derivative. We can start by applying the chain rule backwards from the loss through the output, and specifically, we can decompose this derivative, or gradient, into two parts. The first part is DJ with respect to Dy, which is our output multiplied by Dy with respect to W2. This is possible because Y is only dependent on the previous layer. Now, let's suppose we don't want to do this for W2 but we want to do it for W1. We can use the exact same process, but now it's one step further. We'll now replace W2 with W1, and we need to apply the same process, or chain rule, yet again to decompose the problem further. Now, we propagate our old gradient that we computed for W2 all the way back one more step to the weight that we're interested in, which is W1. And we keep repeating this process over and over again, propagating these gradients backwards from output to input to compute ultimately what we want in the end: the derivative of every weight so the derivative of our loss with respect to every weight in our neural network. This tells us how much does a small change in every single weight in our network affect the loss? Does our loss go up or down if we change this weight a little bit in this direction or a little bit in that direction?",
334
+ "There is no functional difference between a neuron and a perceptron; they are the same. Typically, people say \"neural network,\" which is why a single neuron is also gotten popularity, but originally, the term \"perceptron\" is the formal term.",
335
+ "So, now we've covered a lot. We've covered the forward propagation of information through a neuron and through a neural network all the way through, and we've covered now the back propagation of information to understand how we should change every single one of those weights in our neural network to improve our loss. So, that was the backpropagation algorithm in theory. It's actually pretty simple; it's just a chain rule. There's nothing more than just the chain rule. And the nice part is that deep learning libraries actually do this for you, so they compute backprop for you. You don't actually have to implement it yourself, which is very convenient."
336
+ ],
337
+ "paragraph_timestamps": [
338
+ 2872,
339
+ 2918,
340
+ 2951,
341
+ 3033,
342
+ 3127,
343
+ 3142
344
+ ]
345
+ },
346
+ {
347
+ "num_chapter": 10,
348
+ "title": "Optimizing Neural Networks",
349
+ "start_paragraph_number": 112,
350
+ "end_paragraph_number": 125,
351
+ "start_time": 3192,
352
+ "end_time": 3541,
353
+ "paragraphs": [
354
+ "But now it's important to touch on the fact that, although the theory behind backpropagation is not that complicated, the practice of optimizing neural networks is a completely different story. It's not straightforward at all, and in practice, it's very difficult and usually very computationally intensive to implement the backpropagation algorithm.",
355
+ "Here's an illustration from a paper that came out a few years ago, which attempted to visualize a very deep neural network's loss landscape. Previously, we had a depiction of how a neural network would look in a two-dimensional landscape, but real neural networks are not two-dimensional - they're hundreds or millions or billions of dimensions. Now, what would those loss landscapes look like? You can actually try some clever techniques to visualize them. This is one paper that attempted to do that, and it turns out that they look extremely messy.",
356
+ "The important thing is that, if you do this algorithm and you start in a bad place, depending on your neural network, you may not actually end up in the global solution. So, your initialization matters a lot, and you need to traverse these local minima and try to find the global minimum, or even more than that, you need to construct neural networks that have loss landscapes that are much more amenable to optimization than this one.",
357
+ "So, this is a very bad loss landscape. There are some techniques that we can apply to our neural networks that smooth out their loss landscapes and make them easier to optimize. Recall that update equation that we talked about earlier with gradient descent - the one with the parameter that we didn't talk about, which we described as the 'little step' that you could take. It's a small number that multiplies with the direction, which is your gradient. It just tells you, 'Okay, I'm not going to just go all the way in this direction; I'll just take a small step in this direction.'",
358
+ "In practice, even setting this value right is just one number, and setting this one number can be rather difficult. If we set the learning rate too high, we might overshoot the minimum, and if we set it too low, we might not make enough progress. So, finding the right learning rate is crucial for the optimization process.",
359
+ "Setting the learning rate too small can result in the model getting stuck in local minima, where it converges very slowly. On the other hand, if the learning rate is too large, it can overshoot and even diverge, causing the model to explode and never find a minimum.",
360
+ "Ideally, we want to use learning rates that are not too small and not too large, so they're large enough to avoid local minima but small enough to find their way into the global minimum. Something like this is what we should intuitively have in mind: a learning rate that can overshoot local minima but find its way into a better minimum and then stabilize itself there.",
361
+ "So, how do we actually set these learning rates in practice? What does that process look like?",
362
+ "One idea is to try a bunch of different learning rates and see what works. This is actually a not-bad process in practice, and it's one of the processes that people use. However, let's see if we can do something smarter than this and design algorithms that can adapt to the landscapes.",
363
+ "In practice, there's no reason why this should be a single number. Can we have learning rates that adapt to the model, the data, the landscapes, and the gradients that it's seeing around? This means that the learning rate may actually increase or decrease as a function of the gradients in the loss function, or how fast we're learning, or many other options.",
364
+ "There are many different ideas that could be done here, and in fact, there are many widely used different procedures or methodologies for setting the learning rate. During your labs, we actually encourage you to try out some of these different ideas for different types of learning rates and even play around with them. What's the effect of increasing or decreasing your learning rate? You'll see very striking differences. It's because it's on a close interval, why not just find the absolute minimum? You know, test right?",
365
+ "So, a few things. Number one is that it's not a closed space. Right, so there's an infinite every every weight can be plus or minus up to infinity. Right, so even if it was a one-dimensional neural network with just one weight, it's not a closed space. In practice, it's even worse than that because you have billions of dimensions. Right, so not only is your space your support system in one dimension, it's infinite, but you now have billions of infinite dimensions. Right, or billions of uh infinite support spaces. So, it's not something that you can just like search every weight, every possible weight in your neural in your configuration, or what is every possible weight that this neural network could take and let me test them out. Because it's not practical to do even for a very small neural network in practice.",
366
+ "So, in your labs, you can really try to put all of this information uh in this picture into practice, which defines your model number one right here. Defines your Optimizer, which previously we denoted as this gradient descent Optimizer. Here, we're calling it uh stochastic gradient descent or SGD. We'll talk about that more in a second. And then also note that your Optimizer, which here we're calling SGD, could be any of these adaptive optimizers. You can swap them out, and you should swap them out. You should test different things here to see the impact of these different methods on your training procedure. And you'll gain very valuable intuition for the different insights that will come with that as well."
367
+ ],
368
+ "paragraph_timestamps": [
369
+ 3192,
370
+ 3210,
371
+ 3247,
372
+ 3273,
373
+ 3299,
374
+ 3310,
375
+ 3333,
376
+ 3362,
377
+ 3370,
378
+ 3390,
379
+ 3411,
380
+ 3445,
381
+ 3496
382
+ ]
383
+ },
384
+ {
385
+ "num_chapter": 11,
386
+ "title": "Stochastic Gradient Descent and Mini-Batching",
387
+ "start_paragraph_number": 125,
388
+ "end_paragraph_number": 131,
389
+ "start_time": 3541,
390
+ "end_time": 3748,
391
+ "paragraphs": [
392
+ "So, I want to continue briefly for the end of this lecture to discuss tips for training neural networks in practice and how we can focus on the powerful idea of batching data. Right, not seeing all of your data, but rather talking about a topic called batching. So, let's briefly revisit this gradient descent algorithm. The gradient is computed through this gradient computation, and the backprop algorithm I mentioned earlier is a very computationally expensive operation. It's even worse because we previously described it in a way where we would have to compute it over a summation over every single data point in our entire data set. That's how we defined it with the loss function - it's an average over all of our data points, which means that we're summing over all of our data points the gradients.",
393
+ "In most real-life problems, this would be completely infeasible to do because our data sets are simply too big and the models are too big to compute those gradients on every single iteration. Remember, this isn't just a one-time thing - it's every single step that you do. You keep taking small steps, so you keep needing to repeat this process.",
394
+ "Instead, let's define a new gradient descent algorithm called SGD, or stochastic gradient descent. Instead of computing the gradient over the entire data set, let's just pick a single training point and compute that one training point's gradient. The nice thing about that is that it's much easier to compute that gradient - it only needs one point. The downside is that it's very noisy, it's very stochastic, since it was computed using just that one example.",
395
+ "So, what's the middle ground? The middle ground is to take not one data point, nor the full data set, but a batch of data. This is called a mini-batch, which could be something in practice like 32 pieces of data, a common batch size. This gives us an estimate of the true gradient, so we approximate the gradient by averaging the gradient of these 32 samples. It's still fast because 32 is much smaller than the size of your entire data set, but it's pretty quick now. It's still noisy, but it's a good compromise.",
396
+ "In practice, mini-batches are usually used because you can still iterate much faster, and since B is normally not that large - think of something like in the tens or the hundreds of samples - it's very fast to compute in practice compared to regular gradient descent. It's also much more accurate compared to stochastic gradient descent. The increase in accuracy of this gradient estimation allows us to converge to our solution significantly faster, as well. It's not only about the speed; it's just about the increase in accuracy of those gradients, which allows us to get to our solution much faster, which ultimately means that we can train much faster as well and we can save compute.",
397
+ "The other really nice thing about mini-batches is that they allow for parallelizing our computation. Right, and that was a concept that we had talked about earlier in the class as well. Here's where it's coming in: we can split up those batches, right? So, those 32 pieces of data - let's say if our batch size is 32 - we can split them up onto different workers, right? Different parts of the GPU can tackle those different parts of our data points. This can allow us to basically achieve even more significant speed-ups using GPU architectures and GPU hardware."
398
+ ],
399
+ "paragraph_timestamps": [
400
+ 3541,
401
+ 3589,
402
+ 3609,
403
+ 3643,
404
+ 3673,
405
+ 3714
406
+ ]
407
+ },
408
+ {
409
+ "num_chapter": 12,
410
+ "title": "Understanding Overfitting and Regularization Techniques",
411
+ "start_paragraph_number": 131,
412
+ "end_paragraph_number": 146,
413
+ "start_time": 3748,
414
+ "end_time": 4136,
415
+ "paragraphs": [
416
+ "Finally, the last topic I want to talk about before we end this lecture and move on to lecture number two is overfitting. Overfitting is this idea that is actually not a deep learning-centric problem at all; it's a problem that exists in all of machine learning. The key problem is that it's actually one that addresses how you can accurately define if your model is actually capturing your true data set, right? Or if it's just learning kind of the subtle details that are kind of spurious correlating to your data set.",
417
+ "So, let's say we want to build models that can learn representations from our training data that still generalize to brand new, unseen test points. That's the real goal here - we want to teach our model something based on a lot of training data, but then we don't want it to do well in the training data; we want it to do well when we deploy it into the real world and it's seeing things that it has never seen during training.",
418
+ "The concept of overfitting is exactly addressing that problem. Overfitting means if your model is doing very well on your training data but very badly in testing, it's that means it's overfitting - it's overfitting to the training data that it saw. On the other hand, there's also underfitting. On the left-hand side, you can see basically not fitting the data enough, which means that you know you're going to achieve very similar performance on your testing distribution, but both are underperforming the actual capabilities of your system.",
419
+ "Ideally, you want to end up somewhere in the middle - not too complex, where you're memorizing all of the nuances in your training data, like on the right, but you still want to continue to perform well even based on brand new data, so you're not underfitting as well.",
420
+ "To actually address this problem in neural networks and in machine learning in general, there are a few different ways that you should be aware of and how to do it, because you'll need to apply them as part of your Solutions and your software Labs as well.",
421
+ "The key concept here is called regularization. Regularization is a technique that you can introduce, and simply put, all regularization is is a way to discourage your model from learning these nuances in your training data - that's all it is. And as we've seen before, it's actually critical for our models to be able to generalize - you know, not just on training data, but really what we care about is the testing data.",
422
+ "The most popular regularization technique that's important for you to understand is this very simple idea called Dropout. Let's revisit this picture of a deep neural network that we've been discussing.",
423
+ "In seeing all lectures right, Dropout is a training technique during training. What we're going to do is randomly set some of the activations, or these outputs of every single neuron, to zero. We're just randomly going to set them to zero with some probability. Right, so let's say 50% is our probability. That means that we're going to take all of the activations in our neural network and, with a probability of 50%, before we pass that activation onto the next neuron, we're just going to set it to zero and not pass on anything. So, effectively, 50% of the neurons are going to be kind of shut down or 'killed' in a forward pass, and you're only going to forward pass information with the other 50% of your neurons.",
424
+ "This idea is extremely powerful, actually, because it lowers the capacity of our neural network. It not only lowers the capacity of our neural network but it's dynamically lowering it because, on the next iteration, we're going to pick a different 50% of neurons that we drop out. So, constantly, the network is going to have to learn to build different pathways from input to output, and it can't rely on any small part of the features that are present in any part of the training dataset too extensively. Right, because it's constantly being forced to find these different pathways with random probabilities.",
425
+ "So, that's Dropout. The second regularization technique is going to be this notion called early stopping, which is actually something that is model-agnostic. You can apply this to any type of model as long as you have a testing set that you can play around with.",
426
+ "The idea here is that we have already a pretty formal mathematical definition of what it means to overfit. Overfitting is just when our model starts to perform worse on our test set. That's really all it is. So, what if we plot over the course of training? So, the x-axis is as we're training the model, and let's look at the performance on both the training set and the test set.",
427
+ "In the beginning, you can see that the model is performing really well on both the training set and the test set. But as we continue training, you can see that the model starts to perform better and better on the training set, but worse and worse on the test set. This is actually a sign of overfitting. So, what we can do is stop training when the model starts to perform worse on the test set. This is called early stopping.",
428
+ "By stopping training when the model starts to overfit, we can prevent overfitting and ensure that our model generalizes well to new, unseen data. That's because the training set and the test set are both going down, and they continue to go down, which is excellent because it means that our model is getting stronger. Eventually, though, what you'll notice is that the test loss plateaus and starts to increase. On the other hand, the training loss - there's no reason why the training loss should ever need to stop going down, right? Training losses generally always continue to decay as long as there is capacity in the neural network to learn those differences, right?",
429
+ "But the important point is that this continues for the rest of training, and we want to stop training basically right after this point. This is the really important point because this is where we need to stop training. Right after this point, this is the happy medium because after this point, we start to overfit on parts of the data where our training accuracy becomes actually better than our testing accuracy. So, our testing accuracy is getting worse, but our training accuracy is still improving, which means overfitting.",
430
+ "On the other hand, on the left-hand side, this is the opposite problem. We have not fully utilized the capacity of our model, and the testing accuracy can still improve further. This is a very powerful idea, but it's actually extremely easy to implement in practice because all you really have to do is just monitor the loss over the course of training, right? And you just have to pick the model where the testing accuracy starts to get worse."
431
+ ],
432
+ "paragraph_timestamps": [
433
+ 3748,
434
+ 3791,
435
+ 3814,
436
+ 3849,
437
+ 3864,
438
+ 3877,
439
+ 3905,
440
+ 3918,
441
+ 3964,
442
+ 4000,
443
+ 4011,
444
+ 4021,
445
+ 4031,
446
+ 4068,
447
+ 4098
448
+ ]
449
+ },
450
+ {
451
+ "num_chapter": 13,
452
+ "title": "Summary and Transition to Next Lecture",
453
+ "start_paragraph_number": 146,
454
+ "end_paragraph_number": 149,
455
+ "start_time": 4136,
456
+ "end_time": 4165,
457
+ "paragraphs": [
458
+ "I'll conclude this lecture by summarizing three key points we've covered so far. So far, we've learned the fundamental building blocks of neural networks, starting from a single neuron, also called a perceptron. We've seen how we can stack these systems on top of each other to create a hierarchical network, and how we can mathematically optimize those systems.",
459
+ "We've also covered different types of systems, and then, in the final part of the class, we discussed techniques and tips for training and applying these systems in practice.",
460
+ "Now, in the next lecture, we'll be hearing from Ava on deep sequence modeling using RNNs and a new and exciting algorithm called the Transformer, which is built on the principle of attention. You'll learn more about it in the next class, but for now, let's take a brief pause and resume in about five minutes so we can switch speakers and Ava can start her presentation. Okay, thank you."
461
+ ],
462
+ "paragraph_timestamps": [
463
+ 4136,
464
+ 4156,
465
+ 4165
466
+ ]
467
+ }
468
+ ]
app.py CHANGED
@@ -4,133 +4,178 @@ import json
4
 
5
  from youtube_transcript_api import YouTubeTranscriptApi
6
 
7
- from openai import OpenAI
8
 
9
- import numpy as np
10
- from sklearn.feature_extraction.text import TfidfVectorizer
11
- from sklearn.metrics.pairwise import cosine_similarity
12
 
13
- def gradio_video_id_to_transcript(video_id):
 
 
 
14
 
15
- transcript = YouTubeTranscriptApi.get_transcript(video_id, languages=["en"])
16
- transcript_formatted = [{'start': entry['start'], 'text': entry['text']} for entry in transcript[0:10]]
17
- transcript_formatted_str = json.dumps(transcript_formatted, indent=2)+'...'
18
 
19
- return {output_transcript: transcript_formatted_str,
20
- gv_transcript: transcript}
 
 
21
 
22
- def gradio_transcript_to_paragraphs(gv_transcript_value):
 
 
23
 
24
- paragraphs, nb_input_tokens, nb_output_tokens, price = \
25
- transcript_to_paragraphs(gv_transcript_value, openai_client, openai_model, chunk_size=5000)
26
 
27
- paragraphs_formatted_str = json.dumps(paragraphs[0:4], indent=2)+'...'
28
 
29
- return {output_paragraphs: paragraphs_formatted_str,
30
- gv_paragraphs: paragraphs}
 
 
 
 
31
 
32
- def gradio_paragraphs_to_toc(gv_paragraphs_value):
33
 
34
- paragraphs_dict = gv_paragraphs_value
35
 
36
- json_toc, nb_input_tokens, nb_output_tokens, price = \
37
- paragraphs_to_toc(paragraphs_dict, openai_client, openai_model, chunk_size=100)
38
 
39
- json_toc_formatted_str = json.dumps(json_toc[0:4], indent=2)+'...'
 
40
 
41
- return {output_toc: json_toc_formatted_str,
42
- gv_toc: json_toc}
 
 
43
 
 
44
 
45
- def gradio_get_paragraphs_timestamps(gv_transcript_value, gv_paragraphs_value):
46
 
47
- paragraphs = add_timestamps_to_paragraphs(gv_transcript_value, gv_paragraphs_value, num_words=50)
 
48
 
49
- paragraphs_formatted_str = json.dumps(paragraphs[0:4], indent=2)+'...'
 
 
 
50
 
51
- return {output_paragraphs_timestamps: paragraphs_formatted_str,
52
- gv_paragraphs: paragraphs}
53
 
 
54
 
55
- def gradio_get_chapters(gv_paragraphs_value, gv_toc_value):
 
56
 
57
- chapters = get_chapters(gv_paragraphs_value, gv_toc_value)
58
 
59
- chapters_formatted_str = json.dumps(chapters[0:4], indent=2)+'...'
 
 
 
 
 
60
 
61
- return {output_chapters: chapters_formatted_str,
62
- gv_chapters: chapters}
63
 
 
64
 
65
- def gradio_get_markdown(gv_chapters_value):
66
 
67
- markdown = chapters_to_markdown(gv_chapters_value)
 
68
 
69
- return markdown
 
 
 
70
 
71
- with gr.Blocks() as app:
72
 
73
- gr.Markdown("## Get transcript")
74
 
75
- gv_transcript = gr.State()
76
- video_id_input = gr.Textbox(label="Video ID", value = "ErnWZxJovaM")
77
- get_transcript_button = gr.Button("Get transcript")
78
- output_transcript = gr.Textbox(label = "Transcript (JSON format - start, text)")
79
 
80
- get_transcript_button.click(gradio_video_id_to_transcript,
81
- inputs=[video_id_input],
82
- outputs=[output_transcript, gv_transcript])
 
83
 
84
- gr.Markdown("## Transcript to paragraphs")
85
 
86
- gv_paragraphs = gr.State()
87
- get_paragraphs_button = gr.Button("Get paragraphs")
88
- output_paragraphs = gr.Textbox(label = "Paragraphs (JSON format - paragraph_number, paragraph_text)")
89
 
90
- get_paragraphs_button.click(gradio_transcript_to_paragraphs,
91
- inputs=[gv_transcript],
92
- outputs=[output_paragraphs, gv_paragraphs])
93
 
94
- gr.Markdown("## Get table of content")
95
 
96
- gv_toc = gr.State()
97
- get_toc_button = gr.Button("Get table of contents")
98
- output_toc = gr.Textbox(label = "Table of content (JSON format - paragraph_number, title)")
 
 
 
 
 
 
 
 
99
 
100
- get_toc_button.click(gradio_paragraphs_to_toc,
101
- inputs=[gv_paragraphs],
102
- outputs=[output_toc, gv_toc])
103
 
 
 
 
 
 
 
 
104
 
105
- gr.Markdown("## Infer paragraph timestamps with TF-IDF")
106
 
107
- get_timestamps_button = gr.Button("Infer paragraph timestamps")
108
- output_paragraphs_timestamps = gr.Textbox(label = "Paragraphs (JSON format - paragraph_number, paragraph_text, start)")
109
 
110
- get_timestamps_button.click(gradio_get_paragraphs_timestamps,
111
- inputs=[gv_transcript, gv_paragraphs],
112
- outputs=[output_paragraphs_timestamps, gv_paragraphs])
113
 
114
- gr.Markdown("## Get chapters")
115
 
 
116
 
117
- gv_chapters = gr.State()
118
- get_chapters_button = gr.Button("Get chapters")
119
- output_chapters = gr.Textbox(label = "Chapters (JSON format)")
 
 
 
 
 
 
 
 
 
 
 
120
 
121
- get_chapters_button.click(gradio_get_chapters,
122
- inputs=[gv_paragraphs, gv_toc],
123
- outputs=[output_chapters, gv_chapters])
124
 
 
125
 
126
- gr.Markdown("## Markdown formatting")
127
 
128
- get_markdown_button = gr.Button("Markdown formatting")
129
- output_markdown = gr.Markdown(label = "Chapters (Markdown format)")
130
 
131
- get_markdown_button.click(gradio_get_markdown,
132
- inputs=[gv_chapters],
133
- outputs=[output_markdown])
 
 
134
 
 
135
 
136
- app.launch(debug=True)
 
4
 
5
  from youtube_transcript_api import YouTubeTranscriptApi
6
 
7
+ import utils
8
 
9
+ from openai import OpenAI
10
+ from groq import Groq
 
11
 
12
+ from dotenv import load_dotenv
13
+ load_dotenv()
14
+ GROQ_API_KEY = os.getenv("GROQ_API_KEY")
15
+ OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
16
 
17
+ #import importlib
18
+ #importlib.reload(utils)
 
19
 
20
+ def get_llm_client_and_model(llm_model):
21
+ if llm_model == "llama3-8b":
22
+ llm_client = Groq(api_key=GROQ_API_KEY)
23
+ llm_model = 'llama3-8b-8192'
24
 
25
+ elif llm_model == "gpt-4o-mini":
26
+ llm_client = OpenAI(api_key=OPENAI_API_KEY)
27
+ llm_model = 'gpt-4o-mini-2024-07-18'
28
 
29
+ return llm_client, llm_model
 
30
 
 
31
 
32
+ def gradio_process_video(video_id,
33
+ model_format_transcript, model_toc,
34
+ chunk_size_format_transcript, chunk_size_toc,
35
+ progress=gr.Progress()):
36
+ if video_id in ["ErnWZxJovaM"]:
37
+ chapters = utils.load_json_chapters(video_id)
38
 
39
+ else:
40
 
41
+ transcript = YouTubeTranscriptApi.get_transcript(video_id, languages=["en"])
42
 
43
+ chunk_size_format_transcript = int(chunk_size_format_transcript)
 
44
 
45
+ llm_client_format_transcript, llm_model_format_transcript = \
46
+ get_llm_client_and_model(model_format_transcript)
47
 
48
+ paragraphs, nb_input_tokens, nb_output_tokens, price = \
49
+ utils.transcript_to_paragraphs(transcript, \
50
+ llm_client_format_transcript, llm_model_format_transcript, \
51
+ chunk_size=chunk_size_format_transcript, progress=progress)
52
 
53
+ paragraphs = utils.add_timestamps_to_paragraphs(transcript, paragraphs, num_words=50)
54
 
55
+ chunk_size_toc = int(chunk_size_toc)
56
 
57
+ llm_client_get_toc, llm_model_get_toc = \
58
+ get_llm_client_and_model(model_toc)
59
 
60
+ json_toc, nb_input_tokens, nb_output_tokens, price = \
61
+ utils.paragraphs_to_toc(paragraphs, \
62
+ llm_client_get_toc, llm_model_get_toc, \
63
+ chunk_size=chunk_size_toc)
64
 
65
+ chapters = utils.get_chapters(paragraphs, json_toc)
 
66
 
67
+ output_html = utils.get_result_as_html(chapters, video_id)
68
 
69
+ return {output_processing: str(output_html),
70
+ gv_output: output_html}
71
 
 
72
 
73
+ def gradio_process_video(video_id,
74
+ model_format_transcript, model_toc,
75
+ chunk_size_format_transcript, chunk_size_toc,
76
+ progress=gr.Progress()):
77
+ if video_id in ["ErnWZxJovaM"]:
78
+ chapters = utils.load_json_chapters(video_id)
79
 
80
+ else:
 
81
 
82
+ transcript = YouTubeTranscriptApi.get_transcript(video_id, languages=["en"])
83
 
84
+ chunk_size_format_transcript = int(chunk_size_format_transcript)
85
 
86
+ llm_client_format_transcript, llm_model_format_transcript = \
87
+ get_llm_client_and_model(model_format_transcript)
88
 
89
+ paragraphs, nb_input_tokens, nb_output_tokens, price = \
90
+ utils.transcript_to_paragraphs(transcript, \
91
+ llm_client_format_transcript, llm_model_format_transcript, \
92
+ chunk_size=chunk_size_format_transcript, progress=progress)
93
 
94
+ paragraphs = utils.add_timestamps_to_paragraphs(transcript, paragraphs, num_words=50)
95
 
96
+ chunk_size_toc = int(chunk_size_toc)
97
 
98
+ llm_client_get_toc, llm_model_get_toc = \
99
+ get_llm_client_and_model(model_toc)
 
 
100
 
101
+ json_toc, nb_input_tokens, nb_output_tokens, price = \
102
+ utils.paragraphs_to_toc(paragraphs, \
103
+ llm_client_get_toc, llm_model_get_toc, \
104
+ chunk_size=chunk_size_toc)
105
 
106
+ chapters = utils.get_chapters(paragraphs, json_toc)
107
 
108
+ output_html = utils.get_result_as_html(chapters, video_id)
 
 
109
 
110
+ return {output_processing: str(output_html),
111
+ gv_output: output_html}
 
112
 
 
113
 
114
+ # %%
115
+ css = """
116
+ .content {
117
+ padding: 20px;
118
+ max-width: 800px;
119
+ margin: 0 auto;
120
+ background-color: #ffffff;
121
+ box-shadow: 0 0 10px rgba(0, 0, 0, 0.1);
122
+ border-radius: 8px;
123
+ }
124
+ """
125
 
126
+ example_video_id = "ErnWZxJovaM"
127
+ example_chapters = utils.load_json_chapters(example_video_id)
128
+ example_output_html = utils.get_result_as_html(example_chapters, example_video_id)
129
 
130
+ with (gr.Blocks(css=css) as app):
131
+ gr.HTML("<div align='center'><h1 class='header'>Demo: Automatic video chaptering with LLMs and TF-IDF</h1></div>")
132
+ gr.HTML("<div align='center'><h3 class='header'>From raw transcript to structured document</h3></div>")
133
+ gr.HTML("<hr>")
134
+ gr.Markdown("""This demo relies on
135
+ - Groq's Llama 3 8B for transcript preprocessing
136
+ - OpenAI's GPT-4o-mini for chaptering. Note: Using GPT-4o-mini for transcript preprocessing will improve results, but takes longer (around 2/3 minutes for a one-hour video)
137
 
138
+ The following YouTube video ID are already preprocessed (copy and paste ID in box below):
139
 
140
+ - `ErnWZxJovaM`: [MIT course](https://www.youtube.com/watch?v=ErnWZxJovaM)
141
+ - `EuC1GWhQdKE`: [Anthropic](https://www.youtube.com/watch?v=EuC1GWhQdKE)
142
 
143
+ Check the [Medium article]() for more details"""
144
+ )
 
145
 
146
+ gv_transcript = gr.State()
147
 
148
+ video_id_input = gr.Textbox(label="Enter YouTube Video ID", value="EuC1GWhQdKE")
149
 
150
+ with gr.Accordion("Set parameters", open=False):
151
+ with gr.Row():
152
+ with gr.Column(scale=1):
153
+ model_format_transcript = gr.Dropdown(
154
+ [("LLama 3 8B (Groq)", "llama3-8b"), ("GPT-4o-mini (OpenAI)", "gpt-4o-mini")],
155
+ label="Transcript preprocessing", value="llama3-8b", interactive=True)
156
+ chunk_size_format_transcript = gr.Textbox(label="Preprocessing chunk size", value=2000)
157
+ with gr.Column(scale=1):
158
+ model_toc = gr.Dropdown([("LLama 3 8B (Groq)", "llama3-8b"), ("GPT-4o-mini (OpenAI)", "gpt-4o-mini")],
159
+ label="Chaptering", value="gpt-4o-mini", interactive=True)
160
+ chunk_size_toc = gr.Textbox(label="Chaptering chunk size", value=30)
161
+ with gr.Column(scale=1):
162
+ api_key_openai = gr.Textbox(label="OpenAI API Key", value="xxx")
163
+ api_key_groq = gr.Textbox(label="Groq API Key", value="xxx")
164
 
165
+ processing_button = gr.Button("Process transcript")
 
 
166
 
167
+ gv_output = gr.State()
168
 
169
+ gr.HTML("<hr>")
170
 
171
+ output_processing = gr.HTML(label="Output processing", value=example_output_html)
 
172
 
173
+ processing_button.click(gradio_process_video,
174
+ inputs=[video_id_input,
175
+ model_format_transcript, model_toc,
176
+ chunk_size_format_transcript, chunk_size_toc],
177
+ outputs=[output_processing, gv_output])
178
 
179
+ # gr.HTML(result_as_html)
180
 
181
+ app.launch(debug=True, width="100%")
utils.py ADDED
@@ -0,0 +1,426 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import json
2
+ import re
3
+
4
+ import numpy as np
5
+ from sklearn.feature_extraction.text import TfidfVectorizer
6
+ from sklearn.metrics.pairwise import cosine_similarity
7
+
8
+
9
+ ########################### LLM call ###########################
10
+
11
+ price_token={'gpt-4o': {'input': 5/1000000, 'output': 15/1000000},
12
+ 'gpt-4o-2024-08-06': {'input': 2.5/1000000, 'output': 10/1000000},
13
+ 'gpt-4o-mini-2024-07-18': {'input': 0.15/1000000, 'output': 0.6/1000000},
14
+ 'llama3-8b-8192' : {'input': 0.05 / 1000000, 'output': 0.08 / 1000000},
15
+ 'llama3-70b-8192' : {'input': 0.59 / 1000000, 'output': 0.79 / 1000000},
16
+ 'claude-3-5-sonnet-20240620': {'input': 3/1000000, 'output': 15/1000000},
17
+ 'claude-3-haiku-20240307': {'input': 0.25/1000000, 'output': 1.25/1000000},
18
+ }
19
+ def call_llm(client, model, system_prompt, prompt,
20
+ temperature=0, seed=42, response_format=None, max_tokens=5000):
21
+
22
+ response = client.chat.completions.create(
23
+ messages=[
24
+ {
25
+ "role": "system",
26
+ "content": system_prompt
27
+ },
28
+ {
29
+ "role": "user",
30
+ "content": prompt
31
+ }
32
+ ],
33
+ model=model,
34
+ temperature=temperature,
35
+ seed=seed,
36
+ response_format=response_format,
37
+ max_tokens=max_tokens
38
+ )
39
+
40
+ nb_input_tokens = response.usage.prompt_tokens
41
+ nb_output_tokens = response.usage.completion_tokens
42
+ price = nb_input_tokens * price_token[model]['input'] + nb_output_tokens * price_token[model]['output']
43
+
44
+ print(f"input tokens: {nb_input_tokens}; output tokens: {nb_output_tokens}, price: {price}")
45
+
46
+ response_content=response.choices[0].message.content
47
+
48
+ return response_content, nb_input_tokens, nb_output_tokens, price
49
+
50
+ ########################### Step 2: Transcript to paragraph ###########################
51
+
52
+ system_prompt_transcript_to_paragraphs = f"""
53
+
54
+ You are a helpful assistant.
55
+
56
+ Your task is to improve the user input's readability: add punctuation if needed, remove verbal tics, correct grammatical errors, and add appropriate line breaks with '\n\n'.
57
+
58
+ Put your answer within <answer></answer> tags.
59
+
60
+ """
61
+
62
+
63
+
64
+ def transcript_to_paragraphs(transcript, llm_client, llm_model, chunk_size=5000, progress=None):
65
+
66
+ transcript_as_text = ' '.join([s['text'] for s in transcript])
67
+
68
+ paragraphs = []
69
+ last_paragraph = ""
70
+
71
+ total_nb_input_tokens, total_nb_output_tokens, total_price = 0, 0, 0
72
+
73
+ nb_chunks = int(len(transcript_as_text) / chunk_size) + 1
74
+ progress_i = 0
75
+ print(f"Number of chunks: {nb_chunks}")
76
+
77
+ # for i in range(0, 10000, chunk_size):
78
+ for i in range(0, len(transcript_as_text), chunk_size):
79
+
80
+ print("i is: " + str(i))
81
+
82
+ chunk = last_paragraph + " " + transcript_as_text[i:i + chunk_size]
83
+
84
+ if progress is not None:
85
+ progress_i += 1
86
+ progress(progress_i / nb_chunks, desc="Processing")
87
+
88
+ found_edited_transcript = False
89
+
90
+ while not found_edited_transcript:
91
+
92
+ response_content, nb_input_tokens, nb_output_tokens, price = \
93
+ call_llm(llm_client, llm_model,
94
+ system_prompt=system_prompt_transcript_to_paragraphs, prompt=chunk,
95
+ temperature=0.2, seed=42, response_format=None)
96
+
97
+ if not "</answer>" in response_content:
98
+ response_content += "</answer>"
99
+
100
+ # Extract content from <edited_transcript> tags
101
+ pattern = re.compile(r'<answer>(.*?)</answer>', re.DOTALL)
102
+ response_content_edited = pattern.findall(response_content)
103
+
104
+ if len(response_content_edited) > 0:
105
+ found_edited_transcript = True
106
+ response_content_edited = response_content_edited[0]
107
+
108
+ else:
109
+ print("No edited transcript found. Trying again.")
110
+ print(response_content[0:100])
111
+ print(response_content[-100:])
112
+
113
+ total_nb_input_tokens += nb_input_tokens
114
+ total_nb_output_tokens += nb_output_tokens
115
+ total_price += price
116
+
117
+ paragraphs_chunk = response_content_edited.strip().split('\n\n')
118
+
119
+ print('Found paragraphs:', len(paragraphs_chunk))
120
+ last_paragraph = paragraphs_chunk[-1]
121
+
122
+ paragraphs += paragraphs_chunk[:-1]
123
+
124
+ paragraphs += [last_paragraph]
125
+
126
+ paragraphs_dict = [{'paragraph_number': i, 'paragraph_text': paragraph} for i, paragraph in enumerate(paragraphs)]
127
+
128
+ return paragraphs_dict, total_nb_input_tokens, total_nb_output_tokens, total_price
129
+
130
+ ########################### Step 3: Infer timestamps ###########################
131
+
132
+ def transform_text_segments(text_segments, num_words=50):
133
+ # Initialize variables
134
+ transformed_segments = []
135
+ current_index = 0
136
+ num_segments = len(text_segments)
137
+
138
+ for i in range(num_segments):
139
+
140
+ current_index = i
141
+
142
+ # Get the current segment's starting timestamp and text
143
+ current_segment = text_segments[current_index]
144
+ current_text = current_segment['text']
145
+
146
+ # Initialize a list to hold the combined text
147
+ combined_text = " ".join(current_text.split()[:num_words])
148
+ number_words_collected = len(current_text.split())
149
+
150
+ # Collect words from subsequent segments
151
+ while number_words_collected < num_words and (current_index + 1) < num_segments:
152
+ current_index += 1
153
+ next_segment = text_segments[current_index]
154
+ next_text = next_segment['text']
155
+ next_words = next_text.split()
156
+
157
+ # Append words from the next segment
158
+ if number_words_collected + len(next_words) <= num_words:
159
+ combined_text += ' ' + next_text
160
+ number_words_collected += len(next_words)
161
+ else:
162
+ # Only append enough words to reach the num_words limit
163
+ words_needed = num_words - number_words_collected
164
+ combined_text += ' ' + ' '.join(next_words[:words_needed])
165
+ number_words_collected = num_words
166
+
167
+ # Append the combined segment to the result
168
+ transformed_segments.append(combined_text)
169
+
170
+ return transformed_segments
171
+
172
+
173
+ def add_timestamps_to_paragraphs(transcript, paragraphs, num_words=50):
174
+ list_indices = []
175
+
176
+ transcript_num_words = transform_text_segments(transcript, num_words=num_words)
177
+
178
+ paragraphs_start_text = [{"start": p['paragraph_number'], "text": p['paragraph_text']} for p in paragraphs]
179
+ paragraphs_num_words = transform_text_segments(paragraphs_start_text, num_words=num_words)
180
+
181
+ # Create a TF-IDF vectorizer
182
+ vectorizer = TfidfVectorizer().fit_transform(transcript_num_words + paragraphs_num_words)
183
+ # Get the TF-IDF vectors for the transcript and the excerpt
184
+ vectors = vectorizer.toarray()
185
+
186
+ for i in range(len(paragraphs_num_words)):
187
+
188
+ # Extract the TF-IDF vector for the paragraph
189
+ paragraph_vector = vectors[len(transcript_num_words) + i]
190
+
191
+ # Calculate the cosine similarity between the paragraph vector and each transcript chunk
192
+ similarities = cosine_similarity(vectors[:len(transcript_num_words)], paragraph_vector.reshape(1, -1))
193
+ # Find the index of the most similar chunk
194
+ best_match_index = int(np.argmax(similarities))
195
+
196
+ list_indices.append(best_match_index)
197
+
198
+ paragraphs[i]['matched_index'] = best_match_index
199
+ paragraphs[i]['matched_text'] = transcript[best_match_index]['text']
200
+ paragraphs[i]['start_time'] = int(transcript[best_match_index]['start']) - 2
201
+ if paragraphs[i]['start_time'] < 0:
202
+ paragraphs[i]['start_time'] = 0
203
+
204
+ return paragraphs
205
+
206
+ ########################### Step 4: Generate table of content ###########################
207
+
208
+
209
+ system_prompt_paragraphs_to_toc = """
210
+
211
+ You are a helpful assistant.
212
+
213
+ You are given a transcript of a course in JSON format as a list of paragraphs, each containing 'paragraph_number' and 'paragraph_text' keys.
214
+
215
+ Your task is to group consecutive paragraphs in chapters for the course and identify meaningful chapter titles.
216
+
217
+ Here are the steps to follow:
218
+
219
+ 1. Read the transcript carefully to understand its general structure and the main topics covered.
220
+ 2. Look for clues that a new chapter is about to start. This could be a change of topic, a change of time or setting, the introduction of new themes or topics, or the speaker's explicit mention of a new part.
221
+ 3. For each chapter, keep track of the paragraph number that starts the chapter and identify a meaningful chapter title.
222
+ 4. Chapters should ideally be equally spaced throughout the transcript, and discuss a specific topic.
223
+ 5. A chapter MUST have more than 4 paragraphs.
224
+
225
+ Format your result in JSON, with a list dictionaries for chapters, with 'start_paragraph_number':integer and 'title':string as key:value.
226
+
227
+ Example:
228
+ {"chapters":
229
+ [{"start_paragraph_number": 0, "title": "Introduction"},
230
+ {"start_paragraph_number": 10, "title": "Chapter 1"}
231
+ ]
232
+ }
233
+
234
+ """
235
+
236
+
237
+ def paragraphs_to_toc(paragraphs, llm_client, llm_model, chunk_size=100):
238
+ chapters = []
239
+ number_last_chapter = 0
240
+
241
+ total_nb_input_tokens, total_nb_output_tokens, total_price = 0, 0, 0
242
+
243
+ while number_last_chapter < len(paragraphs):
244
+
245
+ print(number_last_chapter)
246
+
247
+ chunk = paragraphs[number_last_chapter:(number_last_chapter + chunk_size)]
248
+ chunk = [{'paragraph_number': p['paragraph_number'], 'paragraph_text': p['paragraph_text']} for p in chunk]
249
+
250
+ chunk_json_dump = json.dumps(chunk)
251
+
252
+ content, nb_input_tokens, nb_output_tokens, price = call_llm( \
253
+ llm_client, llm_model, \
254
+ system_prompt_paragraphs_to_toc, chunk_json_dump, \
255
+ temperature=0, seed=42, response_format={"type": "json_object"})
256
+
257
+ total_nb_input_tokens += nb_input_tokens
258
+ total_nb_output_tokens += nb_output_tokens
259
+
260
+ chapters_chunk = json.loads(content)['chapters']
261
+
262
+ if number_last_chapter == chapters_chunk[-1]['start_paragraph_number']:
263
+ break
264
+
265
+ chapters += chapters_chunk[:-1]
266
+
267
+ number_last_chapter = chapters_chunk[-1]['start_paragraph_number']
268
+ if number_last_chapter >= len(paragraphs) - 5:
269
+ break
270
+
271
+ total_price = (total_nb_input_tokens * price_token[llm_model]['input'] +
272
+ total_nb_output_tokens * price_token[llm_model]['output'])
273
+
274
+ chapters += [chapters_chunk[-1]]
275
+
276
+ return chapters, total_nb_input_tokens, total_nb_output_tokens, total_price
277
+
278
+
279
+ ########################### Step 5: Chapter rendering functions ###########################
280
+
281
+ def get_chapters(paragraphs, table_of_content):
282
+
283
+ chapters = []
284
+
285
+ for i in range(len(table_of_content)):
286
+
287
+
288
+ if i < len(table_of_content) - 1:
289
+
290
+ chapter = {'num_chapter': i,
291
+ 'title': table_of_content[i]['title'],
292
+ 'start_paragraph_number': table_of_content[i]['start_paragraph_number'],
293
+ 'end_paragraph_number': table_of_content[i + 1]['start_paragraph_number'],
294
+ 'start_time': paragraphs[table_of_content[i]['start_paragraph_number']]['start_time'],
295
+ 'end_time': paragraphs[table_of_content[i + 1]['start_paragraph_number']]['start_time'],
296
+ }
297
+
298
+ else:
299
+ chapter = {'num_chapter': i,
300
+ 'title': table_of_content[i]['title'],
301
+ 'start_paragraph_number': table_of_content[i]['start_paragraph_number'],
302
+ 'end_paragraph_number': len(paragraphs),
303
+ 'start_time': paragraphs[table_of_content[i]['start_paragraph_number']]['start_time'],
304
+ 'end_time': paragraphs[-1]['start_time'],
305
+ }
306
+
307
+ paragraphs_chapter = [paragraphs[j]['paragraph_text'] for j in
308
+ range(chapter['start_paragraph_number'], chapter['end_paragraph_number'])]
309
+
310
+ paragraph_timestamps_chapter = [paragraphs[j]['start_time'] for j in
311
+ range(chapter['start_paragraph_number'], chapter['end_paragraph_number'])]
312
+
313
+ chapter['paragraphs'] = paragraphs_chapter
314
+ chapter['paragraph_timestamps'] = paragraph_timestamps_chapter
315
+
316
+ chapters.append(chapter)
317
+
318
+ return chapters
319
+
320
+ def convert_seconds_to_hms(seconds):
321
+ # Calculate hours, minutes, and remaining seconds
322
+ hours = seconds // 3600
323
+ minutes = (seconds % 3600) // 60
324
+ remaining_seconds = seconds % 60
325
+
326
+ # Format the result as HH:MM:SS
327
+ return f"{hours:02}:{minutes:02}:{remaining_seconds:02}"
328
+
329
+ def toc_to_html(chapters):
330
+
331
+ toc_html = "<h1>Video chapters</h1><p>\n"
332
+
333
+ for chapter in chapters:
334
+ num_chapter = chapter['num_chapter']
335
+ title = chapter['title']
336
+
337
+ from_to = convert_seconds_to_hms(int(chapter['start_time'])) + " - "
338
+
339
+ toc_html += f"""{from_to}<a href = "#{num_chapter}" >{num_chapter+1} - {title}</a><br>\n"""
340
+
341
+ return toc_html
342
+
343
+
344
+ def section_to_html(section_json_data):
345
+ formatted_section = ""
346
+
347
+ paragraphs = section_json_data['paragraphs']
348
+ paragraphs_timestamp_hms = [convert_seconds_to_hms(int(section_json_data['paragraph_timestamps'][i])) for i in range(len(paragraphs))]
349
+
350
+ for i, (paragraph, paragraph_timestamp_hms) in enumerate(zip(paragraphs, paragraphs_timestamp_hms)):
351
+
352
+ formatted_section += f"""
353
+ <div class="row mb-4">
354
+ <div class="col-md-1">
355
+ {paragraph_timestamp_hms}
356
+ </div>
357
+ <div class="col-md-11">
358
+ <p>{paragraph}</p>
359
+ </div>
360
+ </div>"""
361
+
362
+ num_section = section_json_data['num_chapter']
363
+
364
+ from_to = "From "+convert_seconds_to_hms(int(section_json_data['start_time'])) + " to " + convert_seconds_to_hms(
365
+ int(section_json_data['end_time']))
366
+
367
+ title = f"{section_json_data['title']}"
368
+
369
+ title_link = f"""<div class="transcript-title-icon" " id="{num_section}">{num_section+1} - {title}</div>"""
370
+
371
+ summary_section = f"""
372
+ <h2>{title_link}</h2>
373
+ {from_to}
374
+ <p>
375
+ <div class="summary-section">
376
+ <div class="summary-text" >
377
+ {formatted_section}
378
+ </div>
379
+ </div>
380
+ """
381
+
382
+ return summary_section
383
+
384
+
385
+ def get_result_as_html(chapters, video_id):
386
+ video_embed = f"""
387
+ <iframe width="100%" height="400" src="https://www.youtube.com/embed/{video_id}" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
388
+ """
389
+
390
+ toc = toc_to_html(chapters)
391
+
392
+ edited_transcript = f"""
393
+ <h1>Structured transcript</h1>
394
+ <p>
395
+ """
396
+
397
+ for i in range(len(chapters)):
398
+ chapter_json_data = chapters[i]
399
+
400
+ edited_transcript += section_to_html(chapter_json_data)
401
+
402
+ result_as_html = f"""
403
+ <link href="https://cdn.jsdelivr.net/npm/[email protected]/dist/css/bootstrap.min.css" rel="stylesheet">
404
+ <div class="container mt-4">
405
+ <div class="content">
406
+ {video_embed}
407
+ </div>
408
+ <p>
409
+ <div class="content">
410
+ {toc}
411
+ </div>
412
+ <p>
413
+ <div class="content">
414
+ {edited_transcript}
415
+ </div>
416
+ </div>"""
417
+
418
+ return result_as_html
419
+
420
+ def load_json_chapters(video_id):
421
+ file_name = f"{video_id}.json"
422
+ with open(file_name, 'r') as file:
423
+ chapters = json.load(file)
424
+
425
+ return chapters
426
+