AI_DL_Assignment / 11. Assessing Model Performance /3. Finding and Viewing Misclassified Data.srt
Prince-1's picture
Add files using upload-large-folder tool
0182da2 verified
1
00:00:00,830 --> 00:00:05,740
So welcome to 11. to where we actually can find and view our misclassified.
2
00:00:06,120 --> 00:00:12,790
If you recall correctly last classifier was actually confusing sevens with tudes.
3
00:00:13,110 --> 00:00:16,800
So how do we actually see the Sevens and two is that it's can be conveyed.
4
00:00:17,040 --> 00:00:18,860
It's mixing up.
5
00:00:18,930 --> 00:00:22,180
So now let's talk a bit about it first.
6
00:00:22,180 --> 00:00:27,210
Firstly this is actually an under use function I don't see that many could be division the the scientists
7
00:00:27,690 --> 00:00:30,800
using this technique to identify to classify as weakness.
8
00:00:30,840 --> 00:00:35,850
I think it's crucial because by looking at what is misclassifying you can actually figure out oh I need
9
00:00:35,850 --> 00:00:39,240
more of this type of data to make my classifier smarter.
10
00:00:39,240 --> 00:00:43,520
We need to augment more we add some robustness to much ossify.
11
00:00:43,890 --> 00:00:49,590
So viewing the misclassified tested it can tell us a lot of things sometimes what is confusing are the
12
00:00:49,590 --> 00:00:51,690
classes looking similar even to us.
13
00:00:52,080 --> 00:00:56,550
Maybe it's a more complex pattern maybe we need to add more deeply as and maybe our training day to
14
00:00:56,550 --> 00:01:02,340
his mislabel that actually happens quite a bit to me because I tend to label a lot of my data sets myself
15
00:01:02,370 --> 00:01:07,660
which is tedious and exhausting and prone to making errors sometimes.
16
00:01:07,680 --> 00:01:09,370
So let's see how we do this.
17
00:01:09,380 --> 00:01:11,310
No no I buy that book.
18
00:01:11,330 --> 00:01:16,110
But before I go ahead let me just show you what it disk actually tells us.
19
00:01:16,110 --> 00:01:19,560
This is some reallife say that it has been misled classified by a philosopher.
20
00:01:20,020 --> 00:01:23,900
So it should be data a date or fix for you.
21
00:01:24,180 --> 00:01:30,060
But basically what happens here is that this is a data input here or in page number.
22
00:01:30,090 --> 00:01:33,350
So it actually was a 6 0 all classified picked at 0.
23
00:01:33,660 --> 00:01:38,410
Now this clearly there's a 6 0 classified as doing something very wrong here.
24
00:01:38,520 --> 00:01:39,470
This one isn't it.
25
00:01:39,480 --> 00:01:45,270
But we can kind of forgive our classified slightly because it sort of looks like someone wrote to intentionally
26
00:01:45,700 --> 00:01:49,340
and then maybe their pen skipped and then wasn't able to continue with it.
27
00:01:49,410 --> 00:01:55,340
So possibly cause and because it too is most pronounced tick tick a part of this number.
28
00:01:55,500 --> 00:01:58,210
I can see why they classified towards two.
29
00:01:58,470 --> 00:01:59,900
This one was a 9.
30
00:02:00,150 --> 00:02:02,420
Clearly an 0 predicted it was a 9.
31
00:02:02,430 --> 00:02:04,460
How ever it actually wasn't it.
32
00:02:04,470 --> 00:02:06,900
So when I said clearly it was 9 it wasn't it.
33
00:02:06,900 --> 00:02:14,220
Someone wrote it very poorly and basically made this bottom cycle of the very small or perhaps it was
34
00:02:14,220 --> 00:02:15,070
missed classified data.
35
00:02:15,120 --> 00:02:15,920
We don't even know.
36
00:02:16,140 --> 00:02:18,290
But let's trust our data for the.
37
00:02:18,390 --> 00:02:20,370
And let's assume this wasn't it.
38
00:02:20,400 --> 00:02:22,460
That was basically mystified.
39
00:02:23,970 --> 00:02:29,230
This one is for when it actually looks like a nine to me to be honest I wasn't good.
40
00:02:29,520 --> 00:02:32,380
I get scolded for that all the time in high school and private school.
41
00:02:32,730 --> 00:02:33,330
So yeah.
42
00:02:33,360 --> 00:02:34,820
So we understand that one.
43
00:02:34,890 --> 00:02:38,570
This one I say is definitely a 6 or G even.
44
00:02:38,570 --> 00:02:41,010
I mean this would be digits to be fair.
45
00:02:41,460 --> 00:02:47,460
So we can see five good strong should've gotten that as a 6 because basically how it you know you don't
46
00:02:47,460 --> 00:02:49,010
do a 5 like this.
47
00:02:49,080 --> 00:02:50,230
So yeah.
48
00:02:50,250 --> 00:02:56,620
So let's go into our I buy them book and see how we actually create plots or generate images like this.
49
00:02:57,120 --> 00:02:57,420
OK.
50
00:02:57,430 --> 00:03:03,330
So how do we find a misclassified data basically from what I find in the book from from our Python code
51
00:03:03,390 --> 00:03:04,430
basically.
52
00:03:04,530 --> 00:03:06,690
So let's think about as quickly right.
53
00:03:06,810 --> 00:03:10,230
We have test data labels and test data.
54
00:03:10,440 --> 00:03:12,340
And we have our training data which.
55
00:03:12,360 --> 00:03:15,680
So how do we how do we figure out which labels have been wrong.
56
00:03:15,960 --> 00:03:17,310
And that's actually fairly easy.
57
00:03:17,310 --> 00:03:25,170
All we need to do is compare white tests with y prediction and that's what this function and P absolute
58
00:03:25,170 --> 00:03:27,060
does right.
59
00:03:27,120 --> 00:03:32,790
It creates a basically it creates an array that stores a value of 1 when a mistress vission occurs.
60
00:03:32,790 --> 00:03:35,500
And basically we used Asare know this result.
61
00:03:35,760 --> 00:03:42,120
We basically create a matrix here that basically when the result is greater than 1 which means that
62
00:03:42,120 --> 00:03:48,750
it's basically misclassified we get indices here and these indices now will correspond to the actual
63
00:03:48,960 --> 00:03:54,250
digit invite us that all right and why Treen that was actually mis classified.
64
00:03:54,420 --> 00:03:56,220
So why Treen 247.
65
00:03:56,310 --> 00:04:03,450
If you put some brackets in and go to for some that was an actual mistrustfully classified data image
66
00:04:03,540 --> 00:04:04,390
input.
67
00:04:04,740 --> 00:04:08,290
So let's run this when we get this here.
68
00:04:08,910 --> 00:04:11,700
And this does it quite quickly as you can see.
69
00:04:11,700 --> 00:04:15,930
This is providing you actually have why predict we got predict.
70
00:04:15,930 --> 00:04:21,150
Basically if you remember correctly from this hair model that predicts like glasses and then we generated
71
00:04:21,150 --> 00:04:24,840
our confusion and classification report.
72
00:04:24,840 --> 00:04:30,120
So now let's display it using open Zeevi So actually commented on some lines say and that's because
73
00:04:30,120 --> 00:04:35,850
if you wanted to load a model that would assume we didn't run this model and you just wanted to load
74
00:04:35,940 --> 00:04:42,000
a model can load it in here and classify and just change the model to classify here and basically do
75
00:04:42,000 --> 00:04:42,710
that same thing.
76
00:04:42,750 --> 00:04:48,230
And I've come to this line here as I was a Princip and I was just printing the labels.
77
00:04:48,300 --> 00:04:48,860
OK.
78
00:04:49,110 --> 00:04:53,730
We're going to actually display it in all in one image for the first 10 misclassification.
79
00:04:53,730 --> 00:04:54,810
So let's take a look.
80
00:04:57,390 --> 00:05:01,640
So you know it is no let's look at this.
81
00:05:01,760 --> 00:05:02,580
Exactly.
82
00:05:02,900 --> 00:05:08,470
So what it tells us is that this was the input image for this is what it predicted in green.
83
00:05:08,690 --> 00:05:10,830
And this is the actual true value for.
84
00:05:11,180 --> 00:05:12,000
So it's interesting.
85
00:05:12,080 --> 00:05:13,180
Let's take a look at another one.
86
00:05:15,180 --> 00:05:17,560
This one is actually a 6 and 0.
87
00:05:18,100 --> 00:05:19,680
Then keep going.
88
00:05:19,680 --> 00:05:21,130
It's an 8 and actually.
89
00:05:21,140 --> 00:05:22,410
But it predicted a 4.
90
00:05:22,680 --> 00:05:23,490
Same for this one.
91
00:05:23,490 --> 00:05:24,030
This one.
92
00:05:24,020 --> 00:05:25,780
This one's This is pretty cool.
93
00:05:26,010 --> 00:05:30,630
So you can keep going and go through all the misclassification if you want to see what is actually confusing.
94
00:05:30,670 --> 00:05:31,360
You'll classify.