AI_DL_Assignment / 20. Principles of Object Detection /2. Object Detection Introduction - Sliding Windows with HOGs.srt
| 1 | |
| 00:00:00,470 --> 00:00:01,000 | |
| OK. | |
| 2 | |
| 00:00:01,050 --> 00:00:02,430 | |
| So let's start at the beginning. | |
| 3 | |
| 00:00:02,460 --> 00:00:05,520 | |
| Let's talk about object really object vectors. | |
| 4 | |
| 00:00:05,670 --> 00:00:11,910 | |
| So I'm going to introduce you to the history of it so fiercely detection is one of the holy grails of | |
| 5 | |
| 00:00:11,910 --> 00:00:17,610 | |
| computer vision because previously what we have been doing is just classifying like an entire image | |
| 6 | |
| 00:00:17,610 --> 00:00:20,510 | |
| and seeing what objects are what Hassid belong to. | |
| 7 | |
| 00:00:20,730 --> 00:00:26,490 | |
| But can we take an image like this and label each major component into being a dog car person horse | |
| 8 | |
| 00:00:26,760 --> 00:00:28,340 | |
| person in the back. | |
| 9 | |
| 00:00:28,350 --> 00:00:32,230 | |
| Not yet until we have come across up to detection. | |
| 10 | |
| 00:00:32,640 --> 00:00:40,620 | |
| So object detection is a mix of object classification and localization object action is it is the identification | |
| 11 | |
| 00:00:40,650 --> 00:00:43,120 | |
| of a bounding box outlining the object. | |
| 12 | |
| 00:00:43,140 --> 00:00:49,590 | |
| So like in my face here basically is extraction a bony box or on my face and this direction is perhaps | |
| 13 | |
| 00:00:49,590 --> 00:00:53,760 | |
| one of the most popular object detection algorithms that we all know. | |
| 14 | |
| 00:00:53,830 --> 00:00:57,220 | |
| We're all quite familiar with from using cameras in our cell phones. | |
| 15 | |
| 00:00:57,270 --> 00:00:57,780 | |
| OK. | |
| 16 | |
| 00:00:58,290 --> 00:01:04,150 | |
| So basically it DL tells you instead of telling you this object here is a cat. | |
| 17 | |
| 00:01:04,170 --> 00:01:09,070 | |
| It actually tells you where is the cat and that is the whole point of object detection. | |
| 18 | |
| 00:01:10,620 --> 00:01:15,340 | |
| So let's get into the history of it and start with horror Cassiar classifiers. | |
| 19 | |
| 00:01:15,360 --> 00:01:19,140 | |
| Now there were many public detectors before this. | |
| 20 | |
| 00:01:19,140 --> 00:01:24,840 | |
| However here is what made it hard to justify this is what made it mainstream and quite popular because | |
| 21 | |
| 00:01:24,840 --> 00:01:26,340 | |
| it was so fast. | |
| 22 | |
| 00:01:26,370 --> 00:01:33,420 | |
| So basically this was a this was developed by Viola Jones in the face detection algorithm in 2001 not | |
| 23 | |
| 00:01:33,420 --> 00:01:35,480 | |
| that long long ago 17 years ago. | |
| 24 | |
| 00:01:35,520 --> 00:01:40,960 | |
| To be fair and it was superfast and it's actually still use to the number of applications. | |
| 25 | |
| 00:01:41,280 --> 00:01:43,710 | |
| Basically it's been optimized and tweaked to be even faster. | |
| 26 | |
| 00:01:43,710 --> 00:01:49,890 | |
| So it basically reduces the CPQ load and it's very very accurate. | |
| 27 | |
| 00:01:49,890 --> 00:01:52,930 | |
| Basically what it does it's a cascade of classifiers. | |
| 28 | |
| 00:01:53,190 --> 00:01:56,640 | |
| That's basically how it got it got its name and it uses a horror. | |
| 29 | |
| 00:01:56,640 --> 00:01:58,590 | |
| Basically let's go into the next slide. | |
| 30 | |
| 00:01:58,660 --> 00:02:02,760 | |
| Actually I don't have it in this section but it basically uses horror features and harsh features are | |
| 31 | |
| 00:02:02,760 --> 00:02:06,210 | |
| basically basically like you have rectangles. | |
| 32 | |
| 00:02:06,250 --> 00:02:07,100 | |
| Overling here. | |
| 33 | |
| 00:02:07,240 --> 00:02:12,690 | |
| You imagine a white rectangle here and one here and then there are different types of Arcacha pacifies. | |
| 34 | |
| 00:02:12,810 --> 00:02:15,590 | |
| So basically is just a feature extraction. | |
| 35 | |
| 00:02:15,690 --> 00:02:22,350 | |
| Basically what we learned before and it's led this box is that over the window over and over continuously | |
| 36 | |
| 00:02:22,410 --> 00:02:31,950 | |
| looking for a face they're very good but they are pretty hard to train and develop and optimize. | |
| 37 | |
| 00:02:32,010 --> 00:02:38,010 | |
| So let's move on to histogram with gradients and SVM sliding windows so sliding windows is a method | |
| 38 | |
| 00:02:38,010 --> 00:02:43,580 | |
| where we extract segments a full image piece by piece in the form of a rectangular extractor box. | |
| 39 | |
| 00:02:43,590 --> 00:02:48,000 | |
| So I mentioned it in previous slide when I was talking about this box being slid across this image. | |
| 40 | |
| 00:02:48,330 --> 00:02:53,430 | |
| What it does here in this image is a picture of my wife from the last bodybuilding bikini competition | |
| 41 | |
| 00:02:53,430 --> 00:02:54,560 | |
| two months ago. | |
| 42 | |
| 00:02:54,870 --> 00:03:02,550 | |
| And what it does is just imagine this window is being moved here then down here and then down here just | |
| 43 | |
| 00:03:02,550 --> 00:03:05,670 | |
| like remember how we moved across the image. | |
| 44 | |
| 00:03:05,680 --> 00:03:07,960 | |
| And CNN's it's exactly the same thing. | |
| 45 | |
| 00:03:07,970 --> 00:03:14,430 | |
| And we can actually set the same parameters like stride and the size of this box and what this box does | |
| 46 | |
| 00:03:14,430 --> 00:03:17,640 | |
| here in sliding windows with histogram of gradients. | |
| 47 | |
| 00:03:17,700 --> 00:03:25,980 | |
| SVM is that it basically extracts the entire hawgs all his brilliance in this box at different scales. | |
| 48 | |
| 00:03:25,980 --> 00:03:31,620 | |
| So basically it does it with image at one scale and then not a scale smaller scale and then this one | |
| 49 | |
| 00:03:31,620 --> 00:03:35,480 | |
| here and this one basically has no room to go right to just go straight down. | |
| 50 | |
| 00:03:35,760 --> 00:03:39,480 | |
| And it tries to match up to how gradients went what it knows. | |
| 51 | |
| 00:03:39,480 --> 00:03:41,700 | |
| It's supposed to look like to find the object. | |
| 52 | |
| 00:03:42,000 --> 00:03:47,400 | |
| Now as you can see this could be an effective way but it's not really that resilient. | |
| 53 | |
| 00:03:47,400 --> 00:03:48,410 | |
| Why. | |
| 54 | |
| 00:03:48,420 --> 00:03:53,400 | |
| Because imagine we have to do this for every segment of image continuously. | |
| 55 | |
| 00:03:53,400 --> 00:03:55,680 | |
| It gets exhaustive and computationally expensive | |
| 56 | |
| 00:03:58,720 --> 00:04:05,370 | |
| so previous action which is basically TISM feature extraction I just mentioned that and why would we | |
| 57 | |
| 00:04:05,370 --> 00:04:10,740 | |
| want to actually manually find co-features if CNN's actually eliminate that. | |
| 58 | |
| 00:04:10,740 --> 00:04:16,350 | |
| All right CNN's actually automatically find features by just running all these tests destroying data | |
| 59 | |
| 00:04:16,680 --> 00:04:20,350 | |
| Trulia algorithm and finding the last matching it with the correct last. | |
| 60 | |
| 00:04:20,370 --> 00:04:22,770 | |
| So that's what's brilliant about CNN's. | |
| 61 | |
| 00:04:22,770 --> 00:04:24,760 | |
| It takes that step away from us. | |
| 62 | |
| 00:04:26,340 --> 00:04:31,970 | |
| So as I said once of problems we're doing this is a sea of scale. | |
| 63 | |
| 00:04:32,100 --> 00:04:34,920 | |
| Imagine this is a simple image just 20 by 20. | |
| 64 | |
| 00:04:34,920 --> 00:04:36,870 | |
| So this box can be passed over here. | |
| 65 | |
| 00:04:36,960 --> 00:04:39,630 | |
| But imagine this was a much bigger continue TV image. | |
| 66 | |
| 00:04:39,720 --> 00:04:44,130 | |
| How many different times how many different boxes would we extract. | |
| 67 | |
| 00:04:44,130 --> 00:04:46,460 | |
| How do we know what size box should be. | |
| 68 | |
| 00:04:46,470 --> 00:04:50,410 | |
| I mean that's where we rescale image but how many different rescaling are we going to do. | |
| 69 | |
| 00:04:50,440 --> 00:04:54,830 | |
| So as you can see this is not a very effective way of doing object detection. | |
| 70 | |
| 00:04:56,430 --> 00:05:02,600 | |
| So talk a bit the bullet histogram gradients are not going to go in go into this in detail of taught | |
| 71 | |
| 00:05:02,600 --> 00:05:05,480 | |
| this in my other op and see the course you can. | |
| 72 | |
| 00:05:05,480 --> 00:05:07,280 | |
| The video is included free in that section. | |
| 73 | |
| 00:05:07,290 --> 00:05:09,230 | |
| So that's why I'm going to talk about it much here. | |
| 74 | |
| 00:05:09,550 --> 00:05:15,290 | |
| But basically the slides are here for you to go through on your own and you can pretty much infer from | |
| 75 | |
| 00:05:15,290 --> 00:05:17,720 | |
| these steps here what hawgs really are. | |
| 76 | |
| 00:05:20,110 --> 00:05:22,090 | |
| So now we move on to our CNN's. | |