File size: 7,917 Bytes
e62bc71 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 | 1
00:00:00,470 --> 00:00:01,000
OK.
2
00:00:01,050 --> 00:00:02,430
So let's start at the beginning.
3
00:00:02,460 --> 00:00:05,520
Let's talk about object really object vectors.
4
00:00:05,670 --> 00:00:11,910
So I'm going to introduce you to the history of it so fiercely detection is one of the holy grails of
5
00:00:11,910 --> 00:00:17,610
computer vision because previously what we have been doing is just classifying like an entire image
6
00:00:17,610 --> 00:00:20,510
and seeing what objects are what Hassid belong to.
7
00:00:20,730 --> 00:00:26,490
But can we take an image like this and label each major component into being a dog car person horse
8
00:00:26,760 --> 00:00:28,340
person in the back.
9
00:00:28,350 --> 00:00:32,230
Not yet until we have come across up to detection.
10
00:00:32,640 --> 00:00:40,620
So object detection is a mix of object classification and localization object action is it is the identification
11
00:00:40,650 --> 00:00:43,120
of a bounding box outlining the object.
12
00:00:43,140 --> 00:00:49,590
So like in my face here basically is extraction a bony box or on my face and this direction is perhaps
13
00:00:49,590 --> 00:00:53,760
one of the most popular object detection algorithms that we all know.
14
00:00:53,830 --> 00:00:57,220
We're all quite familiar with from using cameras in our cell phones.
15
00:00:57,270 --> 00:00:57,780
OK.
16
00:00:58,290 --> 00:01:04,150
So basically it DL tells you instead of telling you this object here is a cat.
17
00:01:04,170 --> 00:01:09,070
It actually tells you where is the cat and that is the whole point of object detection.
18
00:01:10,620 --> 00:01:15,340
So let's get into the history of it and start with horror Cassiar classifiers.
19
00:01:15,360 --> 00:01:19,140
Now there were many public detectors before this.
20
00:01:19,140 --> 00:01:24,840
However here is what made it hard to justify this is what made it mainstream and quite popular because
21
00:01:24,840 --> 00:01:26,340
it was so fast.
22
00:01:26,370 --> 00:01:33,420
So basically this was a this was developed by Viola Jones in the face detection algorithm in 2001 not
23
00:01:33,420 --> 00:01:35,480
that long long ago 17 years ago.
24
00:01:35,520 --> 00:01:40,960
To be fair and it was superfast and it's actually still use to the number of applications.
25
00:01:41,280 --> 00:01:43,710
Basically it's been optimized and tweaked to be even faster.
26
00:01:43,710 --> 00:01:49,890
So it basically reduces the CPQ load and it's very very accurate.
27
00:01:49,890 --> 00:01:52,930
Basically what it does it's a cascade of classifiers.
28
00:01:53,190 --> 00:01:56,640
That's basically how it got it got its name and it uses a horror.
29
00:01:56,640 --> 00:01:58,590
Basically let's go into the next slide.
30
00:01:58,660 --> 00:02:02,760
Actually I don't have it in this section but it basically uses horror features and harsh features are
31
00:02:02,760 --> 00:02:06,210
basically basically like you have rectangles.
32
00:02:06,250 --> 00:02:07,100
Overling here.
33
00:02:07,240 --> 00:02:12,690
You imagine a white rectangle here and one here and then there are different types of Arcacha pacifies.
34
00:02:12,810 --> 00:02:15,590
So basically is just a feature extraction.
35
00:02:15,690 --> 00:02:22,350
Basically what we learned before and it's led this box is that over the window over and over continuously
36
00:02:22,410 --> 00:02:31,950
looking for a face they're very good but they are pretty hard to train and develop and optimize.
37
00:02:32,010 --> 00:02:38,010
So let's move on to histogram with gradients and SVM sliding windows so sliding windows is a method
38
00:02:38,010 --> 00:02:43,580
where we extract segments a full image piece by piece in the form of a rectangular extractor box.
39
00:02:43,590 --> 00:02:48,000
So I mentioned it in previous slide when I was talking about this box being slid across this image.
40
00:02:48,330 --> 00:02:53,430
What it does here in this image is a picture of my wife from the last bodybuilding bikini competition
41
00:02:53,430 --> 00:02:54,560
two months ago.
42
00:02:54,870 --> 00:03:02,550
And what it does is just imagine this window is being moved here then down here and then down here just
43
00:03:02,550 --> 00:03:05,670
like remember how we moved across the image.
44
00:03:05,680 --> 00:03:07,960
And CNN's it's exactly the same thing.
45
00:03:07,970 --> 00:03:14,430
And we can actually set the same parameters like stride and the size of this box and what this box does
46
00:03:14,430 --> 00:03:17,640
here in sliding windows with histogram of gradients.
47
00:03:17,700 --> 00:03:25,980
SVM is that it basically extracts the entire hawgs all his brilliance in this box at different scales.
48
00:03:25,980 --> 00:03:31,620
So basically it does it with image at one scale and then not a scale smaller scale and then this one
49
00:03:31,620 --> 00:03:35,480
here and this one basically has no room to go right to just go straight down.
50
00:03:35,760 --> 00:03:39,480
And it tries to match up to how gradients went what it knows.
51
00:03:39,480 --> 00:03:41,700
It's supposed to look like to find the object.
52
00:03:42,000 --> 00:03:47,400
Now as you can see this could be an effective way but it's not really that resilient.
53
00:03:47,400 --> 00:03:48,410
Why.
54
00:03:48,420 --> 00:03:53,400
Because imagine we have to do this for every segment of image continuously.
55
00:03:53,400 --> 00:03:55,680
It gets exhaustive and computationally expensive
56
00:03:58,720 --> 00:04:05,370
so previous action which is basically TISM feature extraction I just mentioned that and why would we
57
00:04:05,370 --> 00:04:10,740
want to actually manually find co-features if CNN's actually eliminate that.
58
00:04:10,740 --> 00:04:16,350
All right CNN's actually automatically find features by just running all these tests destroying data
59
00:04:16,680 --> 00:04:20,350
Trulia algorithm and finding the last matching it with the correct last.
60
00:04:20,370 --> 00:04:22,770
So that's what's brilliant about CNN's.
61
00:04:22,770 --> 00:04:24,760
It takes that step away from us.
62
00:04:26,340 --> 00:04:31,970
So as I said once of problems we're doing this is a sea of scale.
63
00:04:32,100 --> 00:04:34,920
Imagine this is a simple image just 20 by 20.
64
00:04:34,920 --> 00:04:36,870
So this box can be passed over here.
65
00:04:36,960 --> 00:04:39,630
But imagine this was a much bigger continue TV image.
66
00:04:39,720 --> 00:04:44,130
How many different times how many different boxes would we extract.
67
00:04:44,130 --> 00:04:46,460
How do we know what size box should be.
68
00:04:46,470 --> 00:04:50,410
I mean that's where we rescale image but how many different rescaling are we going to do.
69
00:04:50,440 --> 00:04:54,830
So as you can see this is not a very effective way of doing object detection.
70
00:04:56,430 --> 00:05:02,600
So talk a bit the bullet histogram gradients are not going to go in go into this in detail of taught
71
00:05:02,600 --> 00:05:05,480
this in my other op and see the course you can.
72
00:05:05,480 --> 00:05:07,280
The video is included free in that section.
73
00:05:07,290 --> 00:05:09,230
So that's why I'm going to talk about it much here.
74
00:05:09,550 --> 00:05:15,290
But basically the slides are here for you to go through on your own and you can pretty much infer from
75
00:05:15,290 --> 00:05:17,720
these steps here what hawgs really are.
76
00:05:20,110 --> 00:05:22,090
So now we move on to our CNN's.
|