Esercizi-MLN/Labs/Lab 6/lab_6.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "426a8016",
   "metadata": {},
   "source": [
    "<center><b><font size=6>Lab-6 A classifier from scratch<b><center>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a39139f5",
   "metadata": {},
   "source": [
    "### Objective: Implement, use and evaluate a classifier (without using specific libraries such as sklearn)\n",
    "1. **Logistic regression** is a binary classification method that maps a linear combination of parameters and variables into two possible classes. Here, you will implement the logistic regression from scratch to better understand how an ML algorithm works. Useful link: <a href=\"https://en.wikipedia.org/wiki/Logistic_regression\">Wiki</a>.\n",
    "2. **Performance evaluation metrics** are needed to evaluate the outcome of prediction with respect to true labels. Here, you will implement confusion matrix, accuracy, precision, recall and F-measure. Useful link: <a href=\"https://en.wikipedia.org/wiki/Confusion_matrix\">Wiki</a>."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "b6bf32f9",
   "metadata": {},
   "outputs": [],
   "source": [
    "# import needed python libraries\n",
    "\n",
    "%matplotlib inline\n",
    "\n",
    "import pandas as pd\n",
    "import seaborn as sns\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import random"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c0959af0",
   "metadata": {},
   "source": [
    "### 1. Dataset - TCP logs\n",
    "The dataset contains traffic information generated by an open-source passive network monitoring tool, namely **tstat**. It automates the collection of packet statistics of traffic aggregates, using real-time monitoring features. Being a passive tool, the typical usage scenario is live monitoring of Internet links, in which all transmitted packets are observed. In case of TCP, Tstat identifies a new flow start when it observes a TCP three-way handshake. Similarly, it identifies a TCP flow end either when it sees the TCP connection teardown, or when it doesn’t observe packets for some time (idle time). A flow is defined by a unique link between the sender and receiver, e.g., a tuple of <em>(IP_Protocol_Type, IP_Source_Address, Source_Port, IP_Destination_Address, Destination_Port)</em>. For a specific flow, tstat calculates a number of statistics of all the packets transmitted over this flow, and then generate a log for such flow with multiple attributes (statistics). A log file is arranged as a simple table where each column is associated to specific information and each row reports the flow during a connection. The log information is a summary of the flow properties. For instance, in the TCP log we can find columns like the starting time of a TCP connection, its duration, the number of sent and received packets, the observed Round Trip Time.\n",
    "![](tstat.png)\n",
    "\n",
    "In this lab, since the focus is on the development of logistic regression from scratch, we only consider a portion of the dataset for simplicity. The data can be found in `log_tcp_part.csv`, in which there are multiple columns, the last one is the class label, indicating the flow is from either **google** or **youtube**, and the rest are features. Your job is a binary classification task to classify the domain of each flow (row) **from scratch**, including:\n",
    "- Build a logistic regression model,\n",
    "- Evaluate the performance."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8fc1d837",
   "metadata": {},
   "source": [
    "1. Load the dataset.\n",
    "2. Get the list of features (columns 1 to 10).\n",
    "3. Add a new column and assign numerical class labels of -1 and 1 to google and youtube.\n",
    "4. Answering the following questions:\n",
    "    - How many features do we have?\n",
    "    - How many samples do we have in total?\n",
    "    - How many samples do we have for each class? Are they similar?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "70294ef9",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/tmp/ipykernel_226018/230400442.py:3: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`\n",
      "  df_tcp.replace({\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>c_msgsize_count</th>\n",
       "      <th>c_pktsize6</th>\n",
       "      <th>c_msgsize4</th>\n",
       "      <th>s_msgsize4</th>\n",
       "      <th>s_pktsize2</th>\n",
       "      <th>s_rtt_cnt</th>\n",
       "      <th>s_rtt_std</th>\n",
       "      <th>s_msgsize5</th>\n",
       "      <th>c_msgsize6</th>\n",
       "      <th>c_sit3</th>\n",
       "      <th>class</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1418</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000</td>\n",
       "      <td>-1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>0.466732</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000</td>\n",
       "      <td>-1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>0.413304</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000</td>\n",
       "      <td>-1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1418</td>\n",
       "      <td>1</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000</td>\n",
       "      <td>-1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1418</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000</td>\n",
       "      <td>-1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19995</th>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "      <td>37</td>\n",
       "      <td>0</td>\n",
       "      <td>1418</td>\n",
       "      <td>3</td>\n",
       "      <td>22.224528</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>3.334</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19996</th>\n",
       "      <td>6</td>\n",
       "      <td>45</td>\n",
       "      <td>45</td>\n",
       "      <td>57</td>\n",
       "      <td>1418</td>\n",
       "      <td>2</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>45</td>\n",
       "      <td>45</td>\n",
       "      <td>1.252</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19997</th>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "      <td>1205</td>\n",
       "      <td>0</td>\n",
       "      <td>531</td>\n",
       "      <td>4</td>\n",
       "      <td>15.323660</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>4975.694</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19998</th>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "      <td>690</td>\n",
       "      <td>0</td>\n",
       "      <td>767</td>\n",
       "      <td>4</td>\n",
       "      <td>17.997651</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1719.125</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19999</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.000</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>20000 rows × 11 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "       c_msgsize_count  c_pktsize6  c_msgsize4  s_msgsize4  s_pktsize2  \\\n",
       "0                    1           0           0           0        1418   \n",
       "1                    1           0           0           0           0   \n",
       "2                    1           0           0           0           0   \n",
       "3                    1           0           0           0        1418   \n",
       "4                    1           0           0           0        1418   \n",
       "...                ...         ...         ...         ...         ...   \n",
       "19995                4           0          37           0        1418   \n",
       "19996                6          45          45          57        1418   \n",
       "19997                4           0        1205           0         531   \n",
       "19998                4           0         690           0         767   \n",
       "19999                1           0           0           0           0   \n",
       "\n",
       "       s_rtt_cnt  s_rtt_std  s_msgsize5  c_msgsize6    c_sit3  class  \n",
       "0              0   0.000000           0           0     0.000     -1  \n",
       "1              3   0.466732           0           0     0.000     -1  \n",
       "2              3   0.413304           0           0     0.000     -1  \n",
       "3              1   0.000000           0           0     0.000     -1  \n",
       "4              0   0.000000           0           0     0.000     -1  \n",
       "...          ...        ...         ...         ...       ...    ...  \n",
       "19995          3  22.224528           0           0     3.334      1  \n",
       "19996          2   0.000000          45          45     1.252      1  \n",
       "19997          4  15.323660           0           0  4975.694      1  \n",
       "19998          4  17.997651           0           0  1719.125      1  \n",
       "19999          1   0.000000           0           0     0.000      1  \n",
       "\n",
       "[20000 rows x 11 columns]"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_tcp = pd.read_csv('log_tcp_part.csv')\n",
    "features = df_tcp.columns[:-1] # Remove class\n",
    "df_tcp.replace({\n",
    "    \"class\": {\n",
    "        \"google\": -1,\n",
    "        \"youtube\": 1,\n",
    "    }\n",
    "}, inplace=True)\n",
    "\n",
    "df_tcp"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "id": "48d85d94",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Number of features: 10\n",
      "Number of samples: 20000\n",
      "Number of samples of google: 10000\n",
      "Number of samples of youtube: 10000\n"
     ]
    }
   ],
   "source": [
    "num_features = features.size\n",
    "num_samples = len(df_tcp)\n",
    "num_google = len(df_tcp.loc[df_tcp[\"class\"] == -1])\n",
    "num_youtube = len(df_tcp.loc[df_tcp[\"class\"] == 1])\n",
    "\n",
    "print(f\"Number of features: {num_features}\")\n",
    "print(f\"Number of samples: {num_samples}\")\n",
    "print(f\"Number of samples of google: {num_google}\")\n",
    "print(f\"Number of samples of youtube: {num_youtube}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c1c8cc80",
   "metadata": {},
   "source": [
    "### 2. Implement your logistic regression learning algorithm\n",
    "Here you will need to construct a class in which you need to define two functions besides the class initialization:\n",
    "- `fit`. In this method you will perform ERM. Learn the parameters of the model (i.e., the hypothesis h) from training with gradient descent\n",
    "- `predict`. In this method given one  sample x (or more) you will perform the inference $sign(h(x))$ to obtain class labels.\n",
    "\n",
    "Hints:\n",
    "\n",
    "- The linear function used in the logistic regression is the following: $h(x)=w^T x +b $, where b is a scalar bias.\n",
    "- Logistic loss: $L((x,y),h)=\\log(1+e^{-y h(x)})$\n",
    "- ERM: $\\min_{w,b} f(w,b)=\\frac{1}{m}\\sum_{i=1}^{m} \\log(1+e^{-y^{(i)} h(x^{(i)})})$\n",
    "- Gradient for weight: $\\nabla_w f(w,b) = \\frac{1}{m} \\sum_i \\frac{-y^{(i)}x^{(i)}}{(1+e^{y^{(i)}h(x^{(i)})})}$\n",
    "- Gradient for bias: $\\nabla_b f(w,b)= \\frac{1}{m} \\sum_i \\frac{-y^{(i)}}{(1+e^{y^{(i)}h(x^{(i)})})}$\n",
    "- Update the parameters: $w \\leftarrow w - \\alpha \\nabla w$, $b \\leftarrow b - \\alpha  \\nabla b$\n",
    "\n",
    "Notice that the sigmoid function $f(z) = \\frac{1}{1 + e^{-z}}$ appears multiple times. You can write also a method for the sigmoid function to help you in the computation. By considering f(z), the gradients rewrite as:\n",
    "\n",
    "- Gradient for weight: $\\nabla_w f(w,b) = \\frac{1}{m} \\sum_i ({f(h(x^{(i)})) - y^{(i)}})x^{(i)}$\n",
    "- Gradient for bias: $\\nabla_b f(w,b) = \\frac{1}{m} \\sum_i ({f(h(x^{(i)})) - y^{(i)}})$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 176,
   "id": "90a02f52",
   "metadata": {},
   "outputs": [],
   "source": [
    "def sigmoid(z):\n",
    "    return 1/(1+np.exp(np.negative(z)))\n",
    "\n",
    "class LogisticRegression:\n",
    "    def __init__(self, learning_rate, num_iterations):\n",
    "        self.learning_rate = learning_rate\n",
    "        self.num_iterations = num_iterations\n",
    "\n",
    "\n",
    "    def h(self, X):\n",
    "        return np.dot(X, self.w) + self.b\n",
    "    \n",
    "    \n",
    "    def gradient_step_w(self, m, X, y):\n",
    "        h = self.h(X)\n",
    "        f = sigmoid(h)\n",
    "        s = np.dot(X.T, np.subtract(f, y))\n",
    "\n",
    "        return s/m\n",
    "    \n",
    "\n",
    "    def gradient_step_b(self, m, X, y):\n",
    "        h = self.h(X)\n",
    "        f = sigmoid(h)\n",
    "        s = np.subtract(f, y).sum()\n",
    "        \n",
    "        return s/m\n",
    "\n",
    "\n",
    "    def fit(self, X, y):\n",
    "        self.w = np.zeros((X.shape[1]))\n",
    "        self.b = 0\n",
    "        m = len(X)\n",
    "        \n",
    "        for i in range(self.num_iterations):\n",
    "            w_step = self.gradient_step_w(m, X, y)\n",
    "            b_step = self.gradient_step_b(m, X, y)\n",
    "\n",
    "            self.w -= self.learning_rate*w_step\n",
    "            self.b -= self.learning_rate*b_step\n",
    "\n",
    "            y_predict = np.transpose(self.predict(X))==y\n",
    "            correct_predictions = np.count_nonzero(y_predict == True)\n",
    "            accuracy = correct_predictions/len(y)\n",
    "            print(accuracy)\n",
    "        \n",
    "\n",
    "    def predict(self, X):\n",
    "        if self.w is None or self.b is None:\n",
    "            raise ValueError\n",
    "        \n",
    "        p = self.h(X)\n",
    "        return np.sign(p)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cc478b78",
   "metadata": {},
   "source": [
    "### 3. Use the model\n",
    "- Initialize your model with predefined learning rate of `0.1` and iterations of `100`.\n",
    "- Fit your model with features and targets.\n",
    "- Get the prediction with features."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 177,
   "id": "af5a590d",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.5768\n",
      "0.5845333333333333\n",
      "0.5632\n",
      "0.6148\n",
      "0.5904666666666667\n",
      "0.617\n",
      "0.5955333333333334\n",
      "0.5915333333333334\n",
      "0.6082666666666666\n",
      "0.5925333333333334\n",
      "0.6115333333333334\n",
      "0.5924666666666667\n",
      "0.6012666666666666\n",
      "0.5922666666666667\n",
      "0.6109333333333333\n",
      "0.5952\n",
      "0.5922\n",
      "0.5996666666666667\n",
      "0.5904666666666667\n",
      "0.6062\n",
      "0.5915333333333334\n",
      "0.5988\n",
      "0.5916\n",
      "0.5979333333333333\n",
      "0.5917333333333333\n",
      "0.5962\n",
      "0.5933333333333334\n",
      "0.5955333333333334\n",
      "0.5945333333333334\n",
      "0.5947333333333333\n",
      "0.5946\n",
      "0.5946666666666667\n",
      "0.5947333333333333\n",
      "0.5946666666666667\n",
      "0.5946666666666667\n",
      "0.5946666666666667\n",
      "0.5946\n",
      "0.5946\n",
      "0.5946666666666667\n",
      "0.5945333333333334\n",
      "0.5946666666666667\n",
      "0.5945333333333334\n",
      "0.5944666666666667\n",
      "0.5946\n",
      "0.5945333333333334\n",
      "0.5946\n",
      "0.5946\n",
      "0.5946666666666667\n",
      "0.5946\n",
      "0.5945333333333334\n",
      "0.5946666666666667\n",
      "0.5946\n",
      "0.5946\n",
      "0.5946\n",
      "0.5946666666666667\n",
      "0.5945333333333334\n",
      "0.5946\n",
      "0.5946666666666667\n",
      "0.5945333333333334\n",
      "0.5946666666666667\n",
      "0.5946666666666667\n",
      "0.5946\n",
      "0.5947333333333333\n",
      "0.5946666666666667\n",
      "0.5947333333333333\n",
      "0.5946\n",
      "0.5947333333333333\n",
      "0.5946666666666667\n",
      "0.5946\n",
      "0.5946666666666667\n",
      "0.5946\n",
      "0.5946\n",
      "0.5946\n",
      "0.5946\n",
      "0.5946\n",
      "0.5946666666666667\n",
      "0.5947333333333333\n",
      "0.5946666666666667\n",
      "0.5947333333333333\n",
      "0.5948\n",
      "0.5948\n",
      "0.5948\n",
      "0.5948\n",
      "0.5948666666666667\n",
      "0.5947333333333333\n",
      "0.5948\n",
      "0.5948666666666667\n",
      "0.5947333333333333\n",
      "0.5948666666666667\n",
      "0.5947333333333333\n",
      "0.5947333333333333\n",
      "0.5947333333333333\n",
      "0.5947333333333333\n",
      "0.5948\n",
      "0.5946666666666667\n",
      "0.5948\n",
      "0.5947333333333333\n",
      "0.5946666666666667\n",
      "0.5948\n",
      "0.5947333333333333\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n",
      "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
      "  return 1/(1+np.exp(np.negative(z)))\n"
     ]
    }
   ],
   "source": [
    "from sklearn.model_selection import train_test_split\n",
    "\n",
    "x_train, x_test, y_train, y_test = train_test_split(df_tcp.drop(columns=[\"class\"], inplace=False), df_tcp[\"class\"])\n",
    "\n",
    "lr = LogisticRegression(0.1, 100)\n",
    "lr.fit(x_train, y_train.values)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 174,
   "id": "beda67a9",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.6056"
      ]
     },
     "execution_count": 174,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y_predict = np.transpose(lr.predict(x_test))==y_test.values\n",
    "correct_predictions = np.count_nonzero(y_predict == True)\n",
    "accuracy = correct_predictions/len(y_test)\n",
    "\n",
    "accuracy"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 175,
   "id": "8db63dad",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.5965333333333334"
      ]
     },
     "execution_count": 175,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y_predict = np.transpose(lr.predict(x_train))==y_train.values\n",
    "correct_predictions = np.count_nonzero(y_predict == True)\n",
    "accuracy = correct_predictions/len(y_train)\n",
    "\n",
    "accuracy"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bc5ad9e7",
   "metadata": {},
   "source": [
    "### 4. Model evaluation\n",
    "With predicted class labels and ground truths, we now evaluate the model performance through confusion matrix and numerical metrics. Specifically, you need to derive the following:\n",
    "- Confusion matrix - Note that, you should indicate the corresponding quantity of each element in the table. Here positive is class 1 and negative is class -1:\n",
    "\\begin{array}{|c|c|c|}\n",
    "\\hline\n",
    " & \\textbf{Predicted Positive} & \\textbf{Predicted Negative} \\\\\n",
    "\\hline\n",
    "\\textbf{Actual Positive} & \\text{True Positive (TP)} & \\text{False Negative (FN)} \\\\\n",
    "\\hline\n",
    "\\textbf{Actual Negative} & \\text{False Positive (FP)} & \\text{True Negative (TN)} \\\\\n",
    "\\hline\n",
    "\\end{array}\n",
    "- Precision of each class and the average value:\n",
    "$\\frac{\\text{True Positive (TP)}}{\\text{True Positive (TP) + False Positive (FP)}}$\n",
    "- Recall of each class and the average value:\n",
    "$\\frac{\\text{True Positive (TP)}}{\\text{True Positive (TP) + False Negative (FN)}}$\n",
    "- F1-score of each class and the average value:\n",
    "$F_1 = \\frac{2 \\times \\text{Precision} \\times \\text{Recall}}{\\text{Precision} + \\text{Recall}}$\n",
    "- Accuracy:\n",
    "$\\frac{\\text{True Positive (TP) + True Negative (TN)}}{\\text{True Positive (TP) + True Negative (TN) + False Positive (FP) + False Negative (FN)}}$\n",
    "- Answering the following questions:\n",
    "    - Do you have same performance between classes? If not, which one performs better?\n",
    "    - Change the parameters of learning rate or number of iterations. Do you have same performance? Better or Worse? Why?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "15b74982",
   "metadata": {},
   "outputs": [],
   "source": [
    "# your answers here"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.2"
  },
  "varInspector": {
   "cols": {
    "lenName": 16,
    "lenType": 16,
    "lenVar": 40
   },
   "kernels_config": {
    "python": {
     "delete_cmd_postfix": "",
     "delete_cmd_prefix": "del ",
     "library": "var_list.py",
     "varRefreshCmd": "print(var_dic_list())"
    },
    "r": {
     "delete_cmd_postfix": ") ",
     "delete_cmd_prefix": "rm(",
     "library": "var_list.r",
     "varRefreshCmd": "cat(var_dic_list()) "
    }
   },
   "types_to_exclude": [
    "module",
    "function",
    "builtin_function_or_method",
    "instance",
    "_Feature"
   ],
   "window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}