AI Assignment: K-Means Clustering, Initialization, and Elbow Method

Verified

Added on 2023/05/30

AI Summary

This assignment solution addresses several aspects of K-Means clustering, a fundamental algorithm in machine learning. The first part provides code using the scikit-learn library to perform K-Means clustering on a sample dataset and demonstrates how to predict cluster assignments and find cluster centers. The second part involves an HTML and JavaScript implementation for visualizing K-Means clustering and the elbow method. The HTML code sets up the structure, while the JavaScript code uses a library to create interactive visualizations, including number lines, and elbow charts to determine the optimal number of clusters (k). The solution also includes explanations of variable initialization in programming and its importance. Finally, the assignment shows an implementation of the elbow method using Python code and libraries like scikit-learn, NumPy, and Matplotlib to determine the optimal number of clusters for a given dataset. The solution also provides the estimated values of Θ1 and Θ2.

Q 1.
from sklearn.cluster import KMeans
import numpy as np
X = np.array([[1, 2], [1, 4], [1, 0],
[4, 2], [4, 4], [4, 0]])
kmeans = KMeans(n_clusters=2, random_state=0).fit(X)
kmeans.labels_
kmeans.predict([[0, 0], [4, 4]])
kmeans.cluster_centers_
Q 2.
a).
<!DOCTYPE html>
<meta charset="utf-8">
<style>
html, body {
height: 100%;
}
body {
margin: 0;
padding: 0;
overflow: hidden;
font-size: 12px;
font-family: Arial, sans-serif;
}
#maindiv {
width: 960px;

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

height: 380px;
}
.dataset-a, .dataset-b {
display: inline-block;
width: 400px;
padding: 0 0 0 50px;
}
#button {
margin: 20px 50px;
}
#error {
margin: 20px 50px;
font-size: 20px;
color: red;
}
</style>
<body>
<script src="moebio_framework.min.js"></script>
<script>
var uniform = []; // please enter values from dataset
var clustered = []; // please enter values from dataset
var elbowData = {};
var maxK = 5;
var newData = false;
var g;

}
function drawNumberLine(g, dataset, label, offset) {
var lineLen = 350;
var tickLen = 5;
var x = 50 + (offset || 0);
var y = 80;
var radius = 5;
var min = 0;
var max = 1;
if (newData) {
min = mo.NumberList.fromArray(dataset).getMin();
max = mo.NumberList.fromArray(dataset).getMax();
}
var range = max - min;
g.setStroke('#777');
g.setFill('rgba(125,125,125,0.5)');
// x-axis
g.line(x, y, x + lineLen, y);
// Ticks
g.line(x, y, x, y + tickLen);
g.line(x + lineLen / 2, y, x + lineLen / 2, y + tickLen);
g.line(x + lineLen, y, x + lineLen, y + tickLen);
// Draw each data point
dataset.forEach(function(d) {

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

g.fCircle(x + ((d - min) / range) * lineLen, y - 1.5 * radius, radius);
})
// Labels
g.setText('#555', 12, 'Arial', "center");
g.fText(min.toFixed(2), x, y + tickLen);
g.fText(((min + max) / 2).toFixed(2), x + lineLen / 2, y + tickLen);
g.fText(max.toFixed(2), x + lineLen, y + tickLen);
g.setText('#555', 16, 'Arial', 'center', 'bottom', 'bold');
g.fText(label, x + lineLen / 2, y - 4 * radius);
}
function drawElbowChart(g, datasetName, offset) {
var xLineLen = 350;
var yLineLen = 200;
var tickLen = 5;
var x = 50 + (offset || 0);
var y = 130;
var elbow = elbowData[datasetName];
var sseMax = elbow.map(function(pair) { return pair[1] });
sseMax = mo.NumberList.fromArray(sseMax).getMax();
g.setStroke("#777");
// Draw axes
g.line(x, y + yLineLen, x + xLineLen, y + yLineLen);
g.line(x, y + yLineLen, x, y)
// x-axis ticks and labels
for (var i = 1; i <= maxK; ++i) {

init: computeData,
cycle: function() {
this.setText('#555', 18, 'Arial', 'center', 'bottom', 'bold');
this.fText("K-means clustering SSE vs. number of clusters for two
random datasets", 450, 20);
this.setStroke("#aaa");
this.line(150, 25, 750, 25);
drawNumberLine(this, clustered, "Dataset A");
drawNumberLine(this, uniform, "Dataset B", 450);
drawElbowChart(this, 'clustered');
drawElbowChart(this, 'uniform', 450);
}
});
g.setBackgroundColor('white');
}
function inputChange() {
function parseInput(id) {
input = document.getElementById("input-" + id);
value = input.value;
dataset = value.split(",").map(function(d) {
val = Number(d);
if (isNaN(val) || !isFinite(val) || d.trim().length === 0) {
throw "Error parsing Dataset " + id.toUpperCase();
}
return val;
});
if (id == "a") {
clustered = dataset;

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

} else {
uniform = dataset;
}
}
try {
var id = "a";
parseInput(id);
id = "b";
parseInput(id);
computeData();
newData = true;
var errorDiv = document.getElementById('error');
error.innerHTML = "";
} catch (e) {
var errorDiv = document.getElementById('error');
error.innerHTML = e;
}
}
window.onload = function() {
var inputA = document.getElementById('input-a');
var inputB = document.getElementById('input-b');
inputA.value = clustered.join(", ");
inputB.value = uniform.join(", ");
setup();

}
</script>
<div id="maindiv"></div>
<div class="dataset-a">
Dataset A: <input type="text" id="input-a" size="45">
</div>
<div class="dataset-b">
Dataset B: <input type="text" id="input-b" size="45">
</div>
<div id="button">
<button type="button" onclick="inputChange()">Parse datasets</button>
</div>
<div id="error"></div>
b)
c).
d).
e).
Initialization is the process of locating and using the defined values for
variable data that is used by a computer program or defining a constant or
variable value that are used in the code for executing a computer program.
Initialization plays a key role in programming as the variables that are used for
writing the code occupy a certain amount of memory in the CPU. If the memory
values are not defined by the user at the start of the code’s execution, the CPU
will set the variable value to anything that is acceptable in computer programming
language, this is usually termed as garbage value.
If a garbage value is set for a variable, then the whole logic of the program
changes and will result in an incorrect value as the output. Some compilers will
not even set a garbage value for the variable and this results to a null value for the

variable which can also result in a compile time error. Initialization is done either
by statically embedding the value at compile time, or else by assignment at run
time. Initialization is important because, historically, uninitialized data has been
a common source of bugs.
If variables are not initialized, then at least the variable values must be overwritten
to erase the garbage data and have a valid value for the variable which will ensure
that the program gives the desired output.
f).
#clustering dataset
# determine k using elbow method
from sklearn.cluster import KMeans
from sklearn import metrics
from scipy.spatial.distance import cdist
import numpy as np
import matplotlib.pyplot as plt
x1 = np.array([])# input dataset 1 values
x2 = np.array([])# input dataset 2 values
plt.plot()
plt.xlim([0, 10])
plt.ylim([0, 10])
plt.title('Dataset')
plt.scatter(x1, x2)
plt.show()

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

# create new plot and data
plt.plot()
X = np.array(list(zip(x1, x2))).reshape(len(x1), 2)
colors = ['b', 'g', 'r']
markers = ['o', 'v', 's']
# k means determine k
distortions = []
K = 5
for k in K:
kmeanModel = KMeans(n_clusters=k).fit(X)
kmeanModel.fit(X)
distortions.append(sum(np.min(cdist(X, kmeanModel.cluster_centers_, 'euclidean'), axis=1)) /
X.shape[0])
# Plot the elbow
plt.plot(K, distortions, 'bx-')
plt.xlabel('k')
plt.ylabel('Distortion')
plt.title('The Elbow Method showing the optimal k')
plt.show()
3.
The estimate values of
Θ1= 0.4491
Θ2 =2.25
The approach with formula used