3a

3b

3c

Based on our output we can see that a majority of the pixels are darker and make up a majority of the image. That's why we can see a significant skew in the image to the left. we can see that there is different shades or darker pixels and vary in density compared to the white pixels. The tail of the distribution towards the right represents the lighter pixels which make up a much smaller portion of the image compared to the darker pixels.

3e

3f

3g

3h

3i

3i

Based on results we see a very low accuracy, and we see the confusion matrix output for each label value.

One thing to note is that our k-means cluster is not able to successfully account for noise so we're going to incorrectly classify noise pixels. However Across the other region we see that our accuracy actually performs reasonably well. Consider the matrix label 1: Urban

[[3872083, 1349976], [ 537263, 80502]],

This actually provides an accuracy score of approximately 67%. Which is not that bad.

So the model either needs to be able to account for noise, or eliminate it from accuracy scores / results.

Another recommendation would be to increase the number of clusters, since a lot of these various regions have various color skeems where the color portion of the feature set could have higher predictive power. Whereas with these reclassification we've over generalized too many categories.

A way we could improve this is by using a hyperameter tuning method like gridsearch to tune the number of k to include, and use cross-validation to find the k that performs optimally.

With these recommendation we would likely see better results!