labs: Add eight lab

labs: Add seventh base code
labs: Add partial sixth lab
2024-11-28 23:01:35 +01:00 · 2024-11-21 17:52:19 +01:00 · 2024-11-21 16:21:30 +01:00 · 2024-11-07 18:10:04 +01:00 · 2024-10-31 16:23:09 +01:00 · 2024-10-31 16:22:50 +01:00
15 changed files with 808717 additions and 0 deletions
--- a/5/darknet_traces.csv
+++ b/5/darknet_traces.csv
--- a/5/lab_5.ipynb
+++ b/5/lab_5.ipynb
--- a/6/lab_6.ipynb
+++ b/6/lab_6.ipynb
@@ -0,0 +1,919 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "426a8016",
+   "metadata": {},
+   "source": [
+    "<center><b><font size=6>Lab-6 A classifier from scratch<b><center>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a39139f5",
+   "metadata": {},
+   "source": [
+    "### Objective: Implement, use and evaluate a classifier (without using specific libraries such as sklearn)\n",
+    "1. **Logistic regression** is a binary classification method that maps a linear combination of parameters and variables into two possible classes. Here, you will implement the logistic regression from scratch to better understand how an ML algorithm works. Useful link: <a href=\"https://en.wikipedia.org/wiki/Logistic_regression\">Wiki</a>.\n",
+    "2. **Performance evaluation metrics** are needed to evaluate the outcome of prediction with respect to true labels. Here, you will implement confusion matrix, accuracy, precision, recall and F-measure. Useful link: <a href=\"https://en.wikipedia.org/wiki/Confusion_matrix\">Wiki</a>."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "b6bf32f9",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# import needed python libraries\n",
+    "\n",
+    "%matplotlib inline\n",
+    "\n",
+    "import pandas as pd\n",
+    "import seaborn as sns\n",
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "import random"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c0959af0",
+   "metadata": {},
+   "source": [
+    "### 1. Dataset - TCP logs\n",
+    "The dataset contains traffic information generated by an open-source passive network monitoring tool, namely **tstat**. It automates the collection of packet statistics of traffic aggregates, using real-time monitoring features. Being a passive tool, the typical usage scenario is live monitoring of Internet links, in which all transmitted packets are observed. In case of TCP, Tstat identifies a new flow start when it observes a TCP three-way handshake. Similarly, it identifies a TCP flow end either when it sees the TCP connection teardown, or when it doesn’t observe packets for some time (idle time). A flow is defined by a unique link between the sender and receiver, e.g., a tuple of <em>(IP_Protocol_Type, IP_Source_Address, Source_Port, IP_Destination_Address, Destination_Port)</em>. For a specific flow, tstat calculates a number of statistics of all the packets transmitted over this flow, and then generate a log for such flow with multiple attributes (statistics). A log file is arranged as a simple table where each column is associated to specific information and each row reports the flow during a connection. The log information is a summary of the flow properties. For instance, in the TCP log we can find columns like the starting time of a TCP connection, its duration, the number of sent and received packets, the observed Round Trip Time.\n",
+    "![](tstat.png)\n",
+    "\n",
+    "In this lab, since the focus is on the development of logistic regression from scratch, we only consider a portion of the dataset for simplicity. The data can be found in `log_tcp_part.csv`, in which there are multiple columns, the last one is the class label, indicating the flow is from either **google** or **youtube**, and the rest are features. Your job is a binary classification task to classify the domain of each flow (row) **from scratch**, including:\n",
+    "- Build a logistic regression model,\n",
+    "- Evaluate the performance."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8fc1d837",
+   "metadata": {},
+   "source": [
+    "1. Load the dataset.\n",
+    "2. Get the list of features (columns 1 to 10).\n",
+    "3. Add a new column and assign numerical class labels of -1 and 1 to google and youtube.\n",
+    "4. Answering the following questions:\n",
+    "    - How many features do we have?\n",
+    "    - How many samples do we have in total?\n",
+    "    - How many samples do we have for each class? Are they similar?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "70294ef9",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/tmp/ipykernel_226018/230400442.py:3: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`\n",
+      "  df_tcp.replace({\n"
+     ]
+    },
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>c_msgsize_count</th>\n",
+       "      <th>c_pktsize6</th>\n",
+       "      <th>c_msgsize4</th>\n",
+       "      <th>s_msgsize4</th>\n",
+       "      <th>s_pktsize2</th>\n",
+       "      <th>s_rtt_cnt</th>\n",
+       "      <th>s_rtt_std</th>\n",
+       "      <th>s_msgsize5</th>\n",
+       "      <th>c_msgsize6</th>\n",
+       "      <th>c_sit3</th>\n",
+       "      <th>class</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>1</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>1418</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0.000000</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0.000</td>\n",
+       "      <td>-1</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>1</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>3</td>\n",
+       "      <td>0.466732</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0.000</td>\n",
+       "      <td>-1</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>1</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>3</td>\n",
+       "      <td>0.413304</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0.000</td>\n",
+       "      <td>-1</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>1</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>1418</td>\n",
+       "      <td>1</td>\n",
+       "      <td>0.000000</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0.000</td>\n",
+       "      <td>-1</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>1</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>1418</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0.000000</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0.000</td>\n",
+       "      <td>-1</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>...</th>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>19995</th>\n",
+       "      <td>4</td>\n",
+       "      <td>0</td>\n",
+       "      <td>37</td>\n",
+       "      <td>0</td>\n",
+       "      <td>1418</td>\n",
+       "      <td>3</td>\n",
+       "      <td>22.224528</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>3.334</td>\n",
+       "      <td>1</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>19996</th>\n",
+       "      <td>6</td>\n",
+       "      <td>45</td>\n",
+       "      <td>45</td>\n",
+       "      <td>57</td>\n",
+       "      <td>1418</td>\n",
+       "      <td>2</td>\n",
+       "      <td>0.000000</td>\n",
+       "      <td>45</td>\n",
+       "      <td>45</td>\n",
+       "      <td>1.252</td>\n",
+       "      <td>1</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>19997</th>\n",
+       "      <td>4</td>\n",
+       "      <td>0</td>\n",
+       "      <td>1205</td>\n",
+       "      <td>0</td>\n",
+       "      <td>531</td>\n",
+       "      <td>4</td>\n",
+       "      <td>15.323660</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>4975.694</td>\n",
+       "      <td>1</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>19998</th>\n",
+       "      <td>4</td>\n",
+       "      <td>0</td>\n",
+       "      <td>690</td>\n",
+       "      <td>0</td>\n",
+       "      <td>767</td>\n",
+       "      <td>4</td>\n",
+       "      <td>17.997651</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>1719.125</td>\n",
+       "      <td>1</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>19999</th>\n",
+       "      <td>1</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>1</td>\n",
+       "      <td>0.000000</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0.000</td>\n",
+       "      <td>1</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "<p>20000 rows × 11 columns</p>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "       c_msgsize_count  c_pktsize6  c_msgsize4  s_msgsize4  s_pktsize2  \\\n",
+       "0                    1           0           0           0        1418   \n",
+       "1                    1           0           0           0           0   \n",
+       "2                    1           0           0           0           0   \n",
+       "3                    1           0           0           0        1418   \n",
+       "4                    1           0           0           0        1418   \n",
+       "...                ...         ...         ...         ...         ...   \n",
+       "19995                4           0          37           0        1418   \n",
+       "19996                6          45          45          57        1418   \n",
+       "19997                4           0        1205           0         531   \n",
+       "19998                4           0         690           0         767   \n",
+       "19999                1           0           0           0           0   \n",
+       "\n",
+       "       s_rtt_cnt  s_rtt_std  s_msgsize5  c_msgsize6    c_sit3  class  \n",
+       "0              0   0.000000           0           0     0.000     -1  \n",
+       "1              3   0.466732           0           0     0.000     -1  \n",
+       "2              3   0.413304           0           0     0.000     -1  \n",
+       "3              1   0.000000           0           0     0.000     -1  \n",
+       "4              0   0.000000           0           0     0.000     -1  \n",
+       "...          ...        ...         ...         ...       ...    ...  \n",
+       "19995          3  22.224528           0           0     3.334      1  \n",
+       "19996          2   0.000000          45          45     1.252      1  \n",
+       "19997          4  15.323660           0           0  4975.694      1  \n",
+       "19998          4  17.997651           0           0  1719.125      1  \n",
+       "19999          1   0.000000           0           0     0.000      1  \n",
+       "\n",
+       "[20000 rows x 11 columns]"
+      ]
+     },
+     "execution_count": 2,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "df_tcp = pd.read_csv('log_tcp_part.csv')\n",
+    "features = df_tcp.columns[:-1] # Remove class\n",
+    "df_tcp.replace({\n",
+    "    \"class\": {\n",
+    "        \"google\": -1,\n",
+    "        \"youtube\": 1,\n",
+    "    }\n",
+    "}, inplace=True)\n",
+    "\n",
+    "df_tcp"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 41,
+   "id": "48d85d94",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Number of features: 10\n",
+      "Number of samples: 20000\n",
+      "Number of samples of google: 10000\n",
+      "Number of samples of youtube: 10000\n"
+     ]
+    }
+   ],
+   "source": [
+    "num_features = features.size\n",
+    "num_samples = len(df_tcp)\n",
+    "num_google = len(df_tcp.loc[df_tcp[\"class\"] == -1])\n",
+    "num_youtube = len(df_tcp.loc[df_tcp[\"class\"] == 1])\n",
+    "\n",
+    "print(f\"Number of features: {num_features}\")\n",
+    "print(f\"Number of samples: {num_samples}\")\n",
+    "print(f\"Number of samples of google: {num_google}\")\n",
+    "print(f\"Number of samples of youtube: {num_youtube}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c1c8cc80",
+   "metadata": {},
+   "source": [
+    "### 2. Implement your logistic regression learning algorithm\n",
+    "Here you will need to construct a class in which you need to define two functions besides the class initialization:\n",
+    "- `fit`. In this method you will perform ERM. Learn the parameters of the model (i.e., the hypothesis h) from training with gradient descent\n",
+    "- `predict`. In this method given one  sample x (or more) you will perform the inference $sign(h(x))$ to obtain class labels.\n",
+    "\n",
+    "Hints:\n",
+    "\n",
+    "- The linear function used in the logistic regression is the following: $h(x)=w^T x +b $, where b is a scalar bias.\n",
+    "- Logistic loss: $L((x,y),h)=\\log(1+e^{-y h(x)})$\n",
+    "- ERM: $\\min_{w,b} f(w,b)=\\frac{1}{m}\\sum_{i=1}^{m} \\log(1+e^{-y^{(i)} h(x^{(i)})})$\n",
+    "- Gradient for weight: $\\nabla_w f(w,b) = \\frac{1}{m} \\sum_i \\frac{-y^{(i)}x^{(i)}}{(1+e^{y^{(i)}h(x^{(i)})})}$\n",
+    "- Gradient for bias: $\\nabla_b f(w,b)= \\frac{1}{m} \\sum_i \\frac{-y^{(i)}}{(1+e^{y^{(i)}h(x^{(i)})})}$\n",
+    "- Update the parameters: $w \\leftarrow w - \\alpha \\nabla w$, $b \\leftarrow b - \\alpha  \\nabla b$\n",
+    "\n",
+    "Notice that the sigmoid function $f(z) = \\frac{1}{1 + e^{-z}}$ appears multiple times. You can write also a method for the sigmoid function to help you in the computation. By considering f(z), the gradients rewrite as:\n",
+    "\n",
+    "- Gradient for weight: $\\nabla_w f(w,b) = \\frac{1}{m} \\sum_i ({f(h(x^{(i)})) - y^{(i)}})x^{(i)}$\n",
+    "- Gradient for bias: $\\nabla_b f(w,b) = \\frac{1}{m} \\sum_i ({f(h(x^{(i)})) - y^{(i)}})$"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 176,
+   "id": "90a02f52",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def sigmoid(z):\n",
+    "    return 1/(1+np.exp(np.negative(z)))\n",
+    "\n",
+    "class LogisticRegression:\n",
+    "    def __init__(self, learning_rate, num_iterations):\n",
+    "        self.learning_rate = learning_rate\n",
+    "        self.num_iterations = num_iterations\n",
+    "\n",
+    "\n",
+    "    def h(self, X):\n",
+    "        return np.dot(X, self.w) + self.b\n",
+    "    \n",
+    "    \n",
+    "    def gradient_step_w(self, m, X, y):\n",
+    "        h = self.h(X)\n",
+    "        f = sigmoid(h)\n",
+    "        s = np.dot(X.T, np.subtract(f, y))\n",
+    "\n",
+    "        return s/m\n",
+    "    \n",
+    "\n",
+    "    def gradient_step_b(self, m, X, y):\n",
+    "        h = self.h(X)\n",
+    "        f = sigmoid(h)\n",
+    "        s = np.subtract(f, y).sum()\n",
+    "        \n",
+    "        return s/m\n",
+    "\n",
+    "\n",
+    "    def fit(self, X, y):\n",
+    "        self.w = np.zeros((X.shape[1]))\n",
+    "        self.b = 0\n",
+    "        m = len(X)\n",
+    "        \n",
+    "        for i in range(self.num_iterations):\n",
+    "            w_step = self.gradient_step_w(m, X, y)\n",
+    "            b_step = self.gradient_step_b(m, X, y)\n",
+    "\n",
+    "            self.w -= self.learning_rate*w_step\n",
+    "            self.b -= self.learning_rate*b_step\n",
+    "\n",
+    "            y_predict = np.transpose(self.predict(X))==y\n",
+    "            correct_predictions = np.count_nonzero(y_predict == True)\n",
+    "            accuracy = correct_predictions/len(y)\n",
+    "            print(accuracy)\n",
+    "        \n",
+    "\n",
+    "    def predict(self, X):\n",
+    "        if self.w is None or self.b is None:\n",
+    "            raise ValueError\n",
+    "        \n",
+    "        p = self.h(X)\n",
+    "        return np.sign(p)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cc478b78",
+   "metadata": {},
+   "source": [
+    "### 3. Use the model\n",
+    "- Initialize your model with predefined learning rate of `0.1` and iterations of `100`.\n",
+    "- Fit your model with features and targets.\n",
+    "- Get the prediction with features."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 177,
+   "id": "af5a590d",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "0.5768\n",
+      "0.5845333333333333\n",
+      "0.5632\n",
+      "0.6148\n",
+      "0.5904666666666667\n",
+      "0.617\n",
+      "0.5955333333333334\n",
+      "0.5915333333333334\n",
+      "0.6082666666666666\n",
+      "0.5925333333333334\n",
+      "0.6115333333333334\n",
+      "0.5924666666666667\n",
+      "0.6012666666666666\n",
+      "0.5922666666666667\n",
+      "0.6109333333333333\n",
+      "0.5952\n",
+      "0.5922\n",
+      "0.5996666666666667\n",
+      "0.5904666666666667\n",
+      "0.6062\n",
+      "0.5915333333333334\n",
+      "0.5988\n",
+      "0.5916\n",
+      "0.5979333333333333\n",
+      "0.5917333333333333\n",
+      "0.5962\n",
+      "0.5933333333333334\n",
+      "0.5955333333333334\n",
+      "0.5945333333333334\n",
+      "0.5947333333333333\n",
+      "0.5946\n",
+      "0.5946666666666667\n",
+      "0.5947333333333333\n",
+      "0.5946666666666667\n",
+      "0.5946666666666667\n",
+      "0.5946666666666667\n",
+      "0.5946\n",
+      "0.5946\n",
+      "0.5946666666666667\n",
+      "0.5945333333333334\n",
+      "0.5946666666666667\n",
+      "0.5945333333333334\n",
+      "0.5944666666666667\n",
+      "0.5946\n",
+      "0.5945333333333334\n",
+      "0.5946\n",
+      "0.5946\n",
+      "0.5946666666666667\n",
+      "0.5946\n",
+      "0.5945333333333334\n",
+      "0.5946666666666667\n",
+      "0.5946\n",
+      "0.5946\n",
+      "0.5946\n",
+      "0.5946666666666667\n",
+      "0.5945333333333334\n",
+      "0.5946\n",
+      "0.5946666666666667\n",
+      "0.5945333333333334\n",
+      "0.5946666666666667\n",
+      "0.5946666666666667\n",
+      "0.5946\n",
+      "0.5947333333333333\n",
+      "0.5946666666666667\n",
+      "0.5947333333333333\n",
+      "0.5946\n",
+      "0.5947333333333333\n",
+      "0.5946666666666667\n",
+      "0.5946\n",
+      "0.5946666666666667\n",
+      "0.5946\n",
+      "0.5946\n",
+      "0.5946\n",
+      "0.5946\n",
+      "0.5946\n",
+      "0.5946666666666667\n",
+      "0.5947333333333333\n",
+      "0.5946666666666667\n",
+      "0.5947333333333333\n",
+      "0.5948\n",
+      "0.5948\n",
+      "0.5948\n",
+      "0.5948\n",
+      "0.5948666666666667\n",
+      "0.5947333333333333\n",
+      "0.5948\n",
+      "0.5948666666666667\n",
+      "0.5947333333333333\n",
+      "0.5948666666666667\n",
+      "0.5947333333333333\n",
+      "0.5947333333333333\n",
+      "0.5947333333333333\n",
+      "0.5947333333333333\n",
+      "0.5948\n",
+      "0.5946666666666667\n",
+      "0.5948\n",
+      "0.5947333333333333\n",
+      "0.5946666666666667\n",
+      "0.5948\n",
+      "0.5947333333333333\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n",
+      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
+      "  return 1/(1+np.exp(np.negative(z)))\n"
+     ]
+    }
+   ],
+   "source": [
+    "from sklearn.model_selection import train_test_split\n",
+    "\n",
+    "x_train, x_test, y_train, y_test = train_test_split(df_tcp.drop(columns=[\"class\"], inplace=False), df_tcp[\"class\"])\n",
+    "\n",
+    "lr = LogisticRegression(0.1, 100)\n",
+    "lr.fit(x_train, y_train.values)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 174,
+   "id": "beda67a9",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "0.6056"
+      ]
+     },
+     "execution_count": 174,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "y_predict = np.transpose(lr.predict(x_test))==y_test.values\n",
+    "correct_predictions = np.count_nonzero(y_predict == True)\n",
+    "accuracy = correct_predictions/len(y_test)\n",
+    "\n",
+    "accuracy"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 175,
+   "id": "8db63dad",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "0.5965333333333334"
+      ]
+     },
+     "execution_count": 175,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "y_predict = np.transpose(lr.predict(x_train))==y_train.values\n",
+    "correct_predictions = np.count_nonzero(y_predict == True)\n",
+    "accuracy = correct_predictions/len(y_train)\n",
+    "\n",
+    "accuracy"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bc5ad9e7",
+   "metadata": {},
+   "source": [
+    "### 4. Model evaluation\n",
+    "With predicted class labels and ground truths, we now evaluate the model performance through confusion matrix and numerical metrics. Specifically, you need to derive the following:\n",
+    "- Confusion matrix - Note that, you should indicate the corresponding quantity of each element in the table. Here positive is class 1 and negative is class -1:\n",
+    "\\begin{array}{|c|c|c|}\n",
+    "\\hline\n",
+    " & \\textbf{Predicted Positive} & \\textbf{Predicted Negative} \\\\\n",
+    "\\hline\n",
+    "\\textbf{Actual Positive} & \\text{True Positive (TP)} & \\text{False Negative (FN)} \\\\\n",
+    "\\hline\n",
+    "\\textbf{Actual Negative} & \\text{False Positive (FP)} & \\text{True Negative (TN)} \\\\\n",
+    "\\hline\n",
+    "\\end{array}\n",
+    "- Precision of each class and the average value:\n",
+    "$\\frac{\\text{True Positive (TP)}}{\\text{True Positive (TP) + False Positive (FP)}}$\n",
+    "- Recall of each class and the average value:\n",
+    "$\\frac{\\text{True Positive (TP)}}{\\text{True Positive (TP) + False Negative (FN)}}$\n",
+    "- F1-score of each class and the average value:\n",
+    "$F_1 = \\frac{2 \\times \\text{Precision} \\times \\text{Recall}}{\\text{Precision} + \\text{Recall}}$\n",
+    "- Accuracy:\n",
+    "$\\frac{\\text{True Positive (TP) + True Negative (TN)}}{\\text{True Positive (TP) + True Negative (TN) + False Positive (FP) + False Negative (FN)}}$\n",
+    "- Answering the following questions:\n",
+    "    - Do you have same performance between classes? If not, which one performs better?\n",
+    "    - Change the parameters of learning rate or number of iterations. Do you have same performance? Better or Worse? Why?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "15b74982",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# your answers here"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.2"
+  },
+  "varInspector": {
+   "cols": {
+    "lenName": 16,
+    "lenType": 16,
+    "lenVar": 40
+   },
+   "kernels_config": {
+    "python": {
+     "delete_cmd_postfix": "",
+     "delete_cmd_prefix": "del ",
+     "library": "var_list.py",
+     "varRefreshCmd": "print(var_dic_list())"
+    },
+    "r": {
+     "delete_cmd_postfix": ") ",
+     "delete_cmd_prefix": "rm(",
+     "library": "var_list.r",
+     "varRefreshCmd": "cat(var_dic_list()) "
+    }
+   },
+   "types_to_exclude": [
+    "module",
+    "function",
+    "builtin_function_or_method",
+    "instance",
+    "_Feature"
+   ],
+   "window_display": false
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
--- a/6/log_tcp_part.csv
+++ b/6/log_tcp_part.csv
--- a/6/tstat.png
+++ b/6/tstat.png
--- a/7/RTP_dataset.csv
+++ b/7/RTP_dataset.csv
--- a/7/lab_7.ipynb
+++ b/7/lab_7.ipynb
--- a/7/video_conference.png
+++ b/7/video_conference.png
--- a/8/RTP_dataset.csv
+++ b/8/RTP_dataset.csv
--- a/8/RTP_dataset.csv.zip
+++ b/8/RTP_dataset.csv.zip
--- a/8/lab_8.ipynb
+++ b/8/lab_8.ipynb
--- a/8/validation.png
+++ b/8/validation.png
--- a/Numpy.ipynb
+++ b/Numpy.ipynb
@@ -0,0 +1,765 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "b1ea060a47a6211d",
+   "metadata": {},
+   "source": [
+    "# LAB #2: Numpy\n",
+    "\n",
+    "## Introduction\n",
+    "In this laboratory, you will perform some operation with NumPy arrays in such a way to build your first Machine Learning model. \n",
+    "In particular, you will build a NumPy-based version of the K-Nearest Neighbors algorithm (a.k.a. KNN).\n",
+    "\n",
+    "## 0 Preliminary steps\n",
+    "### 0.1 NumPy\n",
+    "Make sure you have the NumPy library installed, its use is strongly recommended for this laboratory.\n",
+    "NumPy is the fundamental package for scientific computing with Python. You can read more about it on\n",
+    "the official documentation.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9246699975edf562",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "! pip install numpy"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ad497ed1d0092203",
+   "metadata": {},
+   "source": [
+    "### 0.2 Iris dataset download \n",
+    "For this lab, you will need two of the datasets you have already met: Iris and MNIST. Please refer to\n",
+    "Laboratory 1 for a complete description of the datasets.\n",
+    "Iris. You can download it from:\n",
+    "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a838a5ed77a24051",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# linux users\n",
+    "# !wget https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data -O iris.csv\n",
+    "# windows users\n",
+    "! pip install wget\n",
+    "import wget\n",
+    "wget.download(\"https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data\", \"iris.csv\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ef169d9060adb9a7",
+   "metadata": {},
+   "source": [
+    "## 1 Exercises \n",
+    "Note that exercises marked with a ($\\star$) are optional, you should focus on completing the other ones first."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a820274dc6b6f678",
+   "metadata": {},
+   "source": [
+    "## 1.1 Iris Analysis with Numpy\n",
+    "As you might remember from Lab. 1, the Iris dataset collects the measurements of different Iris flowers,\n",
+    "and each data point is characterized by 4 **features** (sepal length, sepal width, petal length, petal width) and is associated to 1 **label** (i.e. an Iris species - Setosa, Versicolor, or Virginica) which in this case is the last element of the row (last column of the csv file). "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "46864c46cf9f9387",
+   "metadata": {},
+   "source": [
+    "1. Load the Iris dataset. You can use the `csv` library that we saw in the last laboratory or read it with the standard `open(filename, strategy)`. \n",
+    "In the second case remember to split correctly the different fields, and avoid new line characters. In any case check for empty lines. \n",
+    "This time remember to store the 4 features in a numpy array `x` of shape (n_sample, 4) and the labels in a different array `y` of shape (n_sample,) converting the 3 different species to a corresponding numerical value. E.g.,\n",
+    "      - Iris-setosa: 0\n",
+    "      - Iris-versicolor: 1\n",
+    "      - Iris-virginica: 2\n",
+    "\n",
+    "In order to check you have correctly loaded the data, print the shape of the two arrays: you should find\n",
+    "(150, 4) for `x` and (150,) for `y`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "a977ccc88ef2ca39",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "(150, 4)\n",
+      "(150,)\n"
+     ]
+    }
+   ],
+   "source": [
+    "import numpy as np\n",
+    "\n",
+    "def type_mapper(type):\n",
+    "    match type:\n",
+    "        case b\"Iris-setosa\":\n",
+    "            return 0\n",
+    "        case b\"Iris-versicolor\":\n",
+    "            return 1\n",
+    "        case b\"Iris-virginica\":\n",
+    "            return 2\n",
+    "        \n",
+    "    return -1\n",
+    "\n",
+    "raw_csv = np.loadtxt(\"iris.csv\",\n",
+    "                 delimiter=\",\", dtype=float, converters={4:type_mapper})\n",
+    "\n",
+    "x = raw_csv[:,0:4]\n",
+    "y = raw_csv[:,4]\n",
+    "\n",
+    "print(x.shape)\n",
+    "print(y.shape)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5050d162966956ce",
+   "metadata": {},
+   "source": [
+    "2. Compute again the mean and standard deviation for each class by means of the numpy functions"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 31,
+   "id": "33bfaed602d4bc3e",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Metrics for specie 0\n",
+      "Sepal length for mean: 5.006, std_dev: 0.3489469873777391\n",
+      "Sepal width mean: 3.418, std_dev: 0.37719490982779713\n",
+      "Petal length mean: 1.464, std_dev: 0.17176728442867112\n",
+      "Petal width mean: 0.244, std_dev: 0.10613199329137281\n",
+      "\n",
+      "Metrics for specie 1\n",
+      "Sepal length for mean: 5.936, std_dev: 0.5109833656783751\n",
+      "Sepal width mean: 2.7700000000000005, std_dev: 0.31064449134018135\n",
+      "Petal length mean: 4.26, std_dev: 0.4651881339845203\n",
+      "Petal width mean: 1.3259999999999998, std_dev: 0.19576516544063705\n",
+      "\n",
+      "Metrics for specie 2\n",
+      "Sepal length for mean: 6.587999999999998, std_dev: 0.6294886813914926\n",
+      "Sepal width mean: 2.974, std_dev: 0.3192553836664309\n",
+      "Petal length mean: 5.5520000000000005, std_dev: 0.546347874526844\n",
+      "Petal width mean: 2.0260000000000002, std_dev: 0.2718896835115301\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "for i in range(3):\n",
+    "    iris = x[np.ma.masked_where(y, y==i)]\n",
+    "\n",
+    "    print(f\"Metrics for specie {i}\")\n",
+    "    print(f\"Sepal length for mean: {iris[:,0].mean()}, std_dev: {iris[:,0].std()}\")\n",
+    "    print(f\"Sepal width mean: {iris[:,1].mean()}, std_dev: {iris[:,1].std()}\")\n",
+    "    print(f\"Petal length mean: {iris[:,2].mean()}, std_dev: {iris[:,2].std()}\")\n",
+    "    print(f\"Petal width mean: {iris[:,3].mean()}, std_dev: {iris[:,3].std()}\")\n",
+    "    print()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1f84beb708797ba9",
+   "metadata": {},
+   "source": [
+    "3. Compute the distances among two samples (e.g., the $36^{th}$ and the $81^{th}$, the $13^{th}$ and the $15^{th}$) \n",
+    "by means of the `np.linalg.norm(a-b)` function which computes the norm of `a-b`, i.e., the euclidean distance between the feature of the `a` and of the `b` samples. \n",
+    "  - Can you guess if the two couples of samples belong to the same species?\n",
+    "  - From the mean and standard deviations computed before can you guess which species? "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 32,
+   "id": "4a47fb722be07fb4",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "2.7892651361962706\n",
+      "1.4317821063276353\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(np.linalg.norm(x[35]-x[81]))\n",
+    "print(np.linalg.norm(x[12]-x[14]))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9dc024bce0c0dd04",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": [
+    "TODO: write your comment here"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "fd802b47b8519bb3",
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "   "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f3fa448bd7bc9d94",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": [
+    "TODO: write your comment here"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dcceaccd4a1a7526",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": [
+    "4. Find the k nearest neighbors of a sample in the dataset.\n",
+    "    - Define a function `k_nearest_neighbors(x, x_set, k)` that takes as input a sample `x` and a set of sample (i.e., a matrix) `x_set` and returns the indices of the `k` nearest neighbors of `x` in `x_set`.\n",
+    "        - Reuse the `euclidean_distance` function that you defined before to do so. \n",
+    "        - Remember that the `x_set` is a matrix of shape ($N_{samples}, N_{features}$), so you have to compute the distance between `x` and each row of `x_set`. \n",
+    "        - In order to find the indices of the `k` nearest neighbors, you can use the `argsort` function that returns the indices that would sort an array\n",
+    "    - Apply the function to the $36^{th}$ sample of the dataset with $k=5$.\n",
+    "    - Print the indices of the $5$ nearest neighbors.\n",
+    "    - Print the labels of the $5$ nearest neighbors. Can you guess the label of the $36^{th}$ sample?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "b93f94748b3841e3",
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Label of 0 nearest neighbor: 0.0\n",
+      "Label of 1 nearest neighbor: 0.0\n",
+      "Label of 2 nearest neighbor: 0.0\n",
+      "Label of 3 nearest neighbor: 0.0\n",
+      "Label of 4 nearest neighbor: 0.0\n",
+      "Real label: 0.0\n"
+     ]
+    }
+   ],
+   "source": [
+    "def k_nearest_neighbors(x: np.ndarray, x_set: np.ndarray, k: int):\n",
+    "    distances = np.linalg.norm(x-x_set, axis=1)\n",
+    "    distances_sorted = np.argsort(distances)\n",
+    "\n",
+    "    return distances_sorted[0:k]\n",
+    "\n",
+    "indices = k_nearest_neighbors(x[35], x, 5)\n",
+    "for i, k in enumerate(indices):\n",
+    "    print(f\"Label of {i} nearest neighbor: {y[k]}\")\n",
+    "\n",
+    "print(f\"Real label: {y[35]}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4de2b1c8798fc98e",
+   "metadata": {},
+   "source": [
+    "TODO: write your comment here"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9dd1f94b256663e8",
+   "metadata": {},
+   "source": [
+    "## 1.2 KNN design and implementation\n",
+    "In this exercise, you will implement your own version of the K-Nearest Neighbors (KNN) algorithm, and you will use it to assign an\n",
+    "Iris species (i.e. a label) to flowers whose species is unknown.\n",
+    "\n",
+    "The KNN algorithm is straightforward. Suppose that some measurements (e.g., the iris features) and their\n",
+    "relative label (e.g., the iris species) of a set of samples are known in advance. \n",
+    "\n",
+    "<img src=\"https://mlarchive.com/wp-content/uploads/2022/09/img2.png\" width=\"800\">\n",
+    "\n",
+    "Then, whenever we want to label a new sample, we look at the K most similar points (a.k.a. neighbors) and assign a label accordingly. \n",
+    "\n",
+    "<img src=\"https://mlarchive.com/wp-content/uploads/2022/09/img1-1.png\" width=\"800\">\n",
+    "\n",
+    "\n",
+    "The simplest solution is using a majority voting scheme: if the majority of the neighbors votes for a label, we will go for it. \n",
+    "This approach is naive only at first sight: the local similarity assumed by KNN happens to be roughly true, as you have seen in the previous exercises.\n",
+    "Even though this reasoning does not generalize well, the KNN provides a valid baseline for your tasks.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5d185976071690ce",
+   "metadata": {},
+   "source": [
+    "1. Let’s identify a portion of our data for which we will try to guess the species. Randomly select 20%\n",
+    "of the records and store the first four columns (i.e. the features representing each flower) into a\n",
+    "two-dimensional numpy array of shape ($N_{test}, 4$), you can call it `X_test` and $N_{test}$ is the 20% of the total number of samples.\n",
+    "For the same records, store the test label column (i.e. the one with the species values) into another array, namely `y_test`. \n",
+    "This is the data that will be used to test the accuracy of your KNN implementation and its correct functioning (i.e. the testing data)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "a642f03b563650e8",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[1. 0. 0. 2. 2. 0. 2. 1. 1. 1. 1. 1. 2. 2. 2. 1. 1. 2. 2. 2. 1. 2. 2. 2.\n",
+      " 0. 2. 1. 0. 2. 1.]\n"
+     ]
+    }
+   ],
+   "source": [
+    "test_subset_indices = np.random.choice(len(y), size=int(len(y)*0.2), replace=False)\n",
+    "X_test = x[test_subset_indices]\n",
+    "Y_test = y[test_subset_indices]\n",
+    "\n",
+    "x[test_subset_indices]\n",
+    "\n",
+    "print(Y_test)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "192e5663358e8e82",
+   "metadata": {},
+   "source": [
+    "2. Store the remaining 80% of the records in the same way. In this case, use the names X_train andy_train for the arrays.\n",
+    "This is the data that your model will use as ground-truth knowledge (i.e. the training data, from which we extract the knowledge and that we will use for comparison).\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "id": "b9f1639cc7fe3b53",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
+      " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 1. 1.\n",
+      " 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.\n",
+      " 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2.\n",
+      " 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2.]\n"
+     ]
+    }
+   ],
+   "source": [
+    "train_subset_indices = [i not in test_subset_indices for i in range(len(y))]\n",
+    "X_train = x[train_subset_indices]\n",
+    "Y_train = y[train_subset_indices]\n",
+    "\n",
+    "print(Y_train)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dbbc62af2fef1d5c",
+   "metadata": {},
+   "source": [
+    "3. Focus now on the KNN technique. \n",
+    "From the next month, you will use the `scikit-learn` package. Many of its functionalities\n",
+    "are exposed via an object-oriented interface. With this paradigm in mind, implement now the KNN\n",
+    "algorithm and expose it as a Python class. The bare skeleton of your class should look like this (you\n",
+    "are free to add other methods if you want to).\n",
+    "\n",
+    "```\n",
+    "class KNearestNeighbors:\n",
+    "    def __init__(self, k):\n",
+    "        \"\"\"\n",
+    "        Store the value of k in a attribute of the class and initialize other attributes.\n",
+    "        :param k : int, number of neighbors to consider.\n",
+    "        \"\"\"\n",
+    "        pass # TODO: implement it!\n",
+    "    def fit(self, X, y):\n",
+    "        \"\"\"\n",
+    "        Store the 'prior knowledge' of you model that will be used\n",
+    "        to predict new labels.\n",
+    "        :param X : input data points, ndarray, shape = (R,C).\n",
+    "        :param y : input labels, ndarray, shape = (R,).\n",
+    "        \"\"\"\n",
+    "        pass # TODO: implement it!\n",
+    "    \n",
+    "    def predict(self, X):\n",
+    "        \"\"\"Run the KNN classification on X.\n",
+    "        :param X: input data points, ndarray, shape = (N,C).\n",
+    "        :return: labels : ndarray, shape = (N,).\n",
+    "        \"\"\"\n",
+    "        pass # TODO: implement it!\n",
+    "\n",
+    "```\n",
+    "\n",
+    "\n",
+    "Implement the `__init__` and `fit` methods first. \n",
+    "- In the `__init__` method, you should store the value of `k` in a private attribute of the class.\n",
+    "- In the `fit` method you should only store the training data in private attributes of the class."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 104,
+   "id": "b5de6a78df7f8585",
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2024-10-10T12:53:39.426246Z",
+     "start_time": "2024-10-10T12:53:39.420295Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "class KNearestNeighbors:\n",
+    "    def __init__(self, k):\n",
+    "        \"\"\"\n",
+    "        Store the value of k in a attribute of the class and initialize other attributes.\n",
+    "        :param k : int, number of neighbors to consider.\n",
+    "        \"\"\"\n",
+    "        self.k = k\n",
+    "\n",
+    "    def fit(self, X, y):\n",
+    "        \"\"\"\n",
+    "        Store the 'prior knowledge' of you model that will be used\n",
+    "        to predict new labels.\n",
+    "        :param X : input data points, ndarray, shape = (R,C).\n",
+    "        :param y : input labels, ndarray, shape = (R,).\n",
+    "        \"\"\"\n",
+    "        self.X = x\n",
+    "        self.y = y\n",
+    "\n",
+    "    def vote(self, labels: np.ndarray):\n",
+    "        voting = np.unique(labels, return_counts=True)\n",
+    "        return voting[0][voting[1].argmax()]\n",
+    "\n",
+    "    \n",
+    "    def predict(self, X):\n",
+    "        \"\"\"Run the KNN classification on X.\n",
+    "        :param X: input data points, ndarray, shape = (N,C).\n",
+    "        :return: labels : ndarray, shape = (N,).\n",
+    "        \"\"\"\n",
+    "        distances = [np.linalg.norm(x-self.X, axis=1) for x in X]\n",
+    "        distances_sorted = np.argsort(distances)\n",
+    "        nearest_neighbors_labels = y[distances_sorted[:,0:self.k]]\n",
+    "\n",
+    "        return np.apply_along_axis(self.vote, 1, nearest_neighbors_labels)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6ad6f4fc7071bff0",
+   "metadata": {},
+   "source": [
+    "4. Implement the `predict` method. The function receives as input a numpy array with N rows and C\n",
+    "columns, corresponding to N flowers. The method assigns to each row one of the three Iris species \n",
+    "using the KNN algorithm, and returns the predicted species as a numpy array. \n",
+    "\n",
+    "    - For finding nearest neighbours, you can either re-use the previously defined `k_nearest_neighbors` function or \n",
+    "implement a new one exploiting the numpy broadcasting capabilities in order to avoid iterating over the sample matrix `X`.\n",
+    "    - Then, assign the *predicted label* to each sample using a majority voting scheme, i.e., the label that appears most frequently among the k nearest neighbors. To do so you can use the `np.unique(neighbours_labels, return_count=True)` function that returns the unique labels and their counts. \n",
+    "    - Finally, return the predicted labels as a numpy array."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "id": "c227627e47cc7253",
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2024-10-10T13:03:44.621187Z",
+     "start_time": "2024-10-10T13:03:44.609767Z"
+    }
+   },
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4cbd1131d3ba785d",
+   "metadata": {},
+   "source": [
+    "5. Now let’s fit the KNN model with the X_train and y_train data. Then, try to use your KNN model\n",
+    "to predict the species for each record in X_test and store them in a nupy array called y_pred.\n",
+    "As we did in the previous lab, check how many Iris species in the array y_pred have been guessed correctly computing with respect to the ones in y_test computing the accuracy. \n",
+    "    - A prediction is correct if `y_pred[i] == y_test[i]`. To get the accuracy then compute the ratio between the number of correct guesses and the total number of guesses is known. \n",
+    "    - If all labels are assigned correctly ((y_pred == y_test).all() == True), the accuracy of the model is 100%. \n",
+    "    - Instead, if none of the guessed species corresponds to the real one ((y_pred == y_test).any() == False), the accuracy is 0%\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 112,
+   "id": "ca4f0b4bbe44c9fe",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "0.8666666666666667\n"
+     ]
+    }
+   ],
+   "source": [
+    "knn = KNearestNeighbors(5)\n",
+    "knn.fit(X_train, Y_train)\n",
+    "predictions = knn.predict(X_test)\n",
+    "correct_guesses = predictions == Y_test\n",
+    "accuracy = np.count_nonzero(correct_guesses == True) / len(correct_guesses)\n",
+    "print(accuracy)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7514fc82de74b729",
+   "metadata": {},
+   "source": [
+    "6. ($\\star$) As a software developer, you might want to increase the functionalities of your product and\n",
+    "publish newer versions over time. The better your code is structured and organized, the lower is the\n",
+    "effort to release updates.\n",
+    "As such,  extend your KNN implementation adding the parameter `distance`. This has to be one among:\n",
+    "    - Euclidean distance: $ euclidean(p,q) = \\sqrt{\\sum_{i=1}^{n} (p_i _- q_i)^2} $\n",
+    "    - Manhattan distance: $ manhattan(p,q) = \\sum_{i=1}^n |p_i - q_i|$\n",
+    "    - Cosine distance: $ cosine(p, q) = 1 - \\frac{\\sum_{i=1}^n p_i q_i}{ \\sqrt{\\sum^n_{i=1} p^2_i} \\cdot \\sqrt{\\sum^n_{i=1} q_i^2}}$\n",
+    "\n",
+    "If any of this distance is not already implemented in `numpy` implement it yourself"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "436c6395a2f3d853",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "24c76d735fe65dbd",
+   "metadata": {},
+   "source": [
+    "\n",
+    "7. ($\\star$) Again, extend now your KNN implementation by adding the parameter `weights` to the constructor,\n",
+    "as shown below:\n",
+    "\n",
+    "```\n",
+    "class KNearestNeighbors:\n",
+    "    def __init__(self, k, distance_metric=\"euclidean\", weights=\"uniform\"):\n",
+    "        self.k = k\n",
+    "        self.distance_metric = distance_metric\n",
+    "        self.weights = weights\n",
+    "```\n",
+    "\n",
+    "Change your KNN implementation to accept a new weighting scheme for the labels. If weights=\n",
+    "\"distance\", weight neighbor votes by the inverse of their distance (for the distance, again, use\n",
+    "distance_metric). The weight for a neighbor of the point p is:\n",
+    "\n",
+    "$\n",
+    "w(p, n) = \\frac{1}{distance\\_metric(p, n)}\n",
+    "$\n",
+    "\n",
+    "Instead, if the default is chosen (weights=\"uniform\"), use the majority voting you already implemented\n",
+    "in Exercise 6.\n",
+    "\n",
+    "<img src=\"https://mlarchive.com/wp-content/uploads/2022/09/img5.png\">\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a84262b9fd13d9f1",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "54f1e2a662695741",
+   "metadata": {},
+   "source": [
+    "8. ($\\star$) Test the modularity of the implementation applying it on a different dataset. Ideally, you should\n",
+    "not change the code of your KNN python class.\n",
+    "- Download the MNIST dataset and retain only 100 samples per digit. You will end up with a dataset of 1000 samples.\n",
+    "- Define again four numpy arrays as you did in Exercises 2 and 3.\n",
+    "- Apply your KNN as you did for the Iris dataset.\n",
+    "- Evaluate the accuracy on MNIST’s y_test."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b720ef714195eb68",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# download MNIST dataset\n",
+    "\n",
+    "# linux users\n",
+    "#! wget https://raw.githubusercontent.com/dbdmg/data-science-lab/master/datasets/mnist_test.csv -O mnist.csv\n",
+    "\n",
+    "# windows users\n",
+    "! pip install wget\n",
+    "import wget\n",
+    "wget.download(\"https://raw.githubusercontent.com/dbdmg/data-science-lab/master/datasets/mnist_test.csv\", \"mnist.csv\")\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 158,
+   "id": "77afcee410ef94ac",
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[[0 0 0 ... 0 0 0]\n",
+      " [0 0 0 ... 0 0 0]\n",
+      " [0 0 0 ... 0 0 0]\n",
+      " ...\n",
+      " [9 0 0 ... 0 0 0]\n",
+      " [9 0 0 ... 0 0 0]\n",
+      " [9 0 0 ... 0 0 0]]\n",
+      "(1000, 784)\n",
+      "(1000,)\n"
+     ]
+    }
+   ],
+   "source": [
+    "# extracting MNIST dataset\n",
+    "import numpy as np\n",
+    "\n",
+    "raw_csv = np.loadtxt(\"mnist.csv\",\n",
+    "                 delimiter=\",\", dtype=int, converters={4:type_mapper})\n",
+    "\n",
+    "dataset_reduced = np.ndarray((0,785),dtype=int)\n",
+    "\n",
+    "for i in range(10):\n",
+    "    items_with_digit = raw_csv[raw_csv[:,0] == i]\n",
+    "    dataset_reduced = np.concatenate((dataset_reduced, items_with_digit[0:100,:]))\n",
+    "\n",
+    "print(dataset_reduced)\n",
+    "\n",
+    "x = dataset_reduced[:,1:]\n",
+    "y = dataset_reduced[:,0]\n",
+    "\n",
+    "print(x.shape)\n",
+    "print(y.shape)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 160,
+   "id": "d1a0834dd8885a2b",
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "# define four numpy arrays x_train, y_train, x_test, y_test\n",
+    "test_subset_indices = np.random.choice(len(y), size=int(len(y)*0.2), replace=False)\n",
+    "X_test = x[test_subset_indices]\n",
+    "Y_test = y[test_subset_indices]\n",
+    "\n",
+    "x[test_subset_indices]\n",
+    "\n",
+    "train_subset_indices = [i not in test_subset_indices for i in range(len(y))]\n",
+    "X_train = x[train_subset_indices]\n",
+    "Y_train = y[train_subset_indices]\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 171,
+   "id": "c03d2add840c1531",
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "0.885\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Apply KNN on MNIST\n",
+    "knn = KNearestNeighbors(5)\n",
+    "knn.fit(X_train, Y_train)\n",
+    "predictions = knn.predict(X_test)\n",
+    "correct_guesses = predictions == Y_test\n",
+    "accuracy = np.count_nonzero(correct_guesses == True) / len(correct_guesses)\n",
+    "print(accuracy)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.12.6"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
--- a/Numpy.ipynb
+++ b/Numpy.ipynb
--- a/Labs/New_York_City_Map.PNG
+++ b/Labs/New_York_City_Map.PNG
Author	SHA1	Message	Date
Matte23	d5b768962c	labs: Add eight lab	2024-11-28 23:01:35 +01:00
Matte23	25846ac643	labs: Add seventh base code	2024-11-21 17:52:19 +01:00
Matte23	ae84532d96	labs: Add partial sixth lab	2024-11-21 16:21:30 +01:00
Matte23	50287674fd	labs: Add fifth lab	2024-11-07 18:10:04 +01:00
Matte23	b805c7b53f	labs: Add third lab (partial)	2024-10-31 16:23:09 +01:00
Matte23	2c93bc6d68	labs: Add second lab (partial)	2024-10-31 16:22:50 +01:00
Matte23	829e235442	labs: Add second lab (partial)	2024-10-31 16:22:11 +01:00