{ "cells": [ { "cell_type": "markdown", "id": "426a8016", "metadata": {}, "source": [ "
Lab-6 A classifier from scratch
" ] }, { "cell_type": "markdown", "id": "a39139f5", "metadata": {}, "source": [ "### Objective: Implement, use and evaluate a classifier (without using specific libraries such as sklearn)\n", "1. **Logistic regression** is a binary classification method that maps a linear combination of parameters and variables into two possible classes. Here, you will implement the logistic regression from scratch to better understand how an ML algorithm works. Useful link: Wiki.\n", "2. **Performance evaluation metrics** are needed to evaluate the outcome of prediction with respect to true labels. Here, you will implement confusion matrix, accuracy, precision, recall and F-measure. Useful link: Wiki." ] }, { "cell_type": "code", "execution_count": 1, "id": "b6bf32f9", "metadata": {}, "outputs": [], "source": [ "# import needed python libraries\n", "\n", "%matplotlib inline\n", "\n", "import pandas as pd\n", "import seaborn as sns\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import random" ] }, { "cell_type": "markdown", "id": "c0959af0", "metadata": {}, "source": [ "### 1. Dataset - TCP logs\n", "The dataset contains traffic information generated by an open-source passive network monitoring tool, namely **tstat**. It automates the collection of packet statistics of traffic aggregates, using real-time monitoring features. Being a passive tool, the typical usage scenario is live monitoring of Internet links, in which all transmitted packets are observed. In case of TCP, Tstat identifies a new flow start when it observes a TCP three-way handshake. Similarly, it identifies a TCP flow end either when it sees the TCP connection teardown, or when it doesn’t observe packets for some time (idle time). A flow is defined by a unique link between the sender and receiver, e.g., a tuple of (IP_Protocol_Type, IP_Source_Address, Source_Port, IP_Destination_Address, Destination_Port). For a specific flow, tstat calculates a number of statistics of all the packets transmitted over this flow, and then generate a log for such flow with multiple attributes (statistics). A log file is arranged as a simple table where each column is associated to specific information and each row reports the flow during a connection. The log information is a summary of the flow properties. For instance, in the TCP log we can find columns like the starting time of a TCP connection, its duration, the number of sent and received packets, the observed Round Trip Time.\n", "![](tstat.png)\n", "\n", "In this lab, since the focus is on the development of logistic regression from scratch, we only consider a portion of the dataset for simplicity. The data can be found in `log_tcp_part.csv`, in which there are multiple columns, the last one is the class label, indicating the flow is from either **google** or **youtube**, and the rest are features. Your job is a binary classification task to classify the domain of each flow (row) **from scratch**, including:\n", "- Build a logistic regression model,\n", "- Evaluate the performance." ] }, { "cell_type": "markdown", "id": "8fc1d837", "metadata": {}, "source": [ "1. Load the dataset.\n", "2. Get the list of features (columns 1 to 10).\n", "3. Add a new column and assign numerical class labels of -1 and 1 to google and youtube.\n", "4. Answering the following questions:\n", " - How many features do we have?\n", " - How many samples do we have in total?\n", " - How many samples do we have for each class? Are they similar?" ] }, { "cell_type": "code", "execution_count": 2, "id": "70294ef9", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/tmp/ipykernel_226018/230400442.py:3: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`\n", " df_tcp.replace({\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
c_msgsize_countc_pktsize6c_msgsize4s_msgsize4s_pktsize2s_rtt_cnts_rtt_stds_msgsize5c_msgsize6c_sit3class
01000141800.000000000.000-1
11000030.466732000.000-1
21000030.413304000.000-1
31000141810.000000000.000-1
41000141800.000000000.000-1
....................................
19995403701418322.224528003.3341
199966454557141820.00000045451.2521
199974012050531415.323660004975.6941
19998406900767417.997651001719.1251
199991000010.000000000.0001
\n", "

20000 rows × 11 columns

\n", "
" ], "text/plain": [ " c_msgsize_count c_pktsize6 c_msgsize4 s_msgsize4 s_pktsize2 \\\n", "0 1 0 0 0 1418 \n", "1 1 0 0 0 0 \n", "2 1 0 0 0 0 \n", "3 1 0 0 0 1418 \n", "4 1 0 0 0 1418 \n", "... ... ... ... ... ... \n", "19995 4 0 37 0 1418 \n", "19996 6 45 45 57 1418 \n", "19997 4 0 1205 0 531 \n", "19998 4 0 690 0 767 \n", "19999 1 0 0 0 0 \n", "\n", " s_rtt_cnt s_rtt_std s_msgsize5 c_msgsize6 c_sit3 class \n", "0 0 0.000000 0 0 0.000 -1 \n", "1 3 0.466732 0 0 0.000 -1 \n", "2 3 0.413304 0 0 0.000 -1 \n", "3 1 0.000000 0 0 0.000 -1 \n", "4 0 0.000000 0 0 0.000 -1 \n", "... ... ... ... ... ... ... \n", "19995 3 22.224528 0 0 3.334 1 \n", "19996 2 0.000000 45 45 1.252 1 \n", "19997 4 15.323660 0 0 4975.694 1 \n", "19998 4 17.997651 0 0 1719.125 1 \n", "19999 1 0.000000 0 0 0.000 1 \n", "\n", "[20000 rows x 11 columns]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_tcp = pd.read_csv('log_tcp_part.csv')\n", "features = df_tcp.columns[:-1] # Remove class\n", "df_tcp.replace({\n", " \"class\": {\n", " \"google\": -1,\n", " \"youtube\": 1,\n", " }\n", "}, inplace=True)\n", "\n", "df_tcp" ] }, { "cell_type": "code", "execution_count": 41, "id": "48d85d94", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of features: 10\n", "Number of samples: 20000\n", "Number of samples of google: 10000\n", "Number of samples of youtube: 10000\n" ] } ], "source": [ "num_features = features.size\n", "num_samples = len(df_tcp)\n", "num_google = len(df_tcp.loc[df_tcp[\"class\"] == -1])\n", "num_youtube = len(df_tcp.loc[df_tcp[\"class\"] == 1])\n", "\n", "print(f\"Number of features: {num_features}\")\n", "print(f\"Number of samples: {num_samples}\")\n", "print(f\"Number of samples of google: {num_google}\")\n", "print(f\"Number of samples of youtube: {num_youtube}\")" ] }, { "cell_type": "markdown", "id": "c1c8cc80", "metadata": {}, "source": [ "### 2. Implement your logistic regression learning algorithm\n", "Here you will need to construct a class in which you need to define two functions besides the class initialization:\n", "- `fit`. In this method you will perform ERM. Learn the parameters of the model (i.e., the hypothesis h) from training with gradient descent\n", "- `predict`. In this method given one sample x (or more) you will perform the inference $sign(h(x))$ to obtain class labels.\n", "\n", "Hints:\n", "\n", "- The linear function used in the logistic regression is the following: $h(x)=w^T x +b $, where b is a scalar bias.\n", "- Logistic loss: $L((x,y),h)=\\log(1+e^{-y h(x)})$\n", "- ERM: $\\min_{w,b} f(w,b)=\\frac{1}{m}\\sum_{i=1}^{m} \\log(1+e^{-y^{(i)} h(x^{(i)})})$\n", "- Gradient for weight: $\\nabla_w f(w,b) = \\frac{1}{m} \\sum_i \\frac{-y^{(i)}x^{(i)}}{(1+e^{y^{(i)}h(x^{(i)})})}$\n", "- Gradient for bias: $\\nabla_b f(w,b)= \\frac{1}{m} \\sum_i \\frac{-y^{(i)}}{(1+e^{y^{(i)}h(x^{(i)})})}$\n", "- Update the parameters: $w \\leftarrow w - \\alpha \\nabla w$, $b \\leftarrow b - \\alpha \\nabla b$\n", "\n", "Notice that the sigmoid function $f(z) = \\frac{1}{1 + e^{-z}}$ appears multiple times. You can write also a method for the sigmoid function to help you in the computation. By considering f(z), the gradients rewrite as:\n", "\n", "- Gradient for weight: $\\nabla_w f(w,b) = \\frac{1}{m} \\sum_i ({f(h(x^{(i)})) - y^{(i)}})x^{(i)}$\n", "- Gradient for bias: $\\nabla_b f(w,b) = \\frac{1}{m} \\sum_i ({f(h(x^{(i)})) - y^{(i)}})$" ] }, { "cell_type": "code", "execution_count": 176, "id": "90a02f52", "metadata": {}, "outputs": [], "source": [ "def sigmoid(z):\n", " return 1/(1+np.exp(np.negative(z)))\n", "\n", "class LogisticRegression:\n", " def __init__(self, learning_rate, num_iterations):\n", " self.learning_rate = learning_rate\n", " self.num_iterations = num_iterations\n", "\n", "\n", " def h(self, X):\n", " return np.dot(X, self.w) + self.b\n", " \n", " \n", " def gradient_step_w(self, m, X, y):\n", " h = self.h(X)\n", " f = sigmoid(h)\n", " s = np.dot(X.T, np.subtract(f, y))\n", "\n", " return s/m\n", " \n", "\n", " def gradient_step_b(self, m, X, y):\n", " h = self.h(X)\n", " f = sigmoid(h)\n", " s = np.subtract(f, y).sum()\n", " \n", " return s/m\n", "\n", "\n", " def fit(self, X, y):\n", " self.w = np.zeros((X.shape[1]))\n", " self.b = 0\n", " m = len(X)\n", " \n", " for i in range(self.num_iterations):\n", " w_step = self.gradient_step_w(m, X, y)\n", " b_step = self.gradient_step_b(m, X, y)\n", "\n", " self.w -= self.learning_rate*w_step\n", " self.b -= self.learning_rate*b_step\n", "\n", " y_predict = np.transpose(self.predict(X))==y\n", " correct_predictions = np.count_nonzero(y_predict == True)\n", " accuracy = correct_predictions/len(y)\n", " print(accuracy)\n", " \n", "\n", " def predict(self, X):\n", " if self.w is None or self.b is None:\n", " raise ValueError\n", " \n", " p = self.h(X)\n", " return np.sign(p)" ] }, { "cell_type": "markdown", "id": "cc478b78", "metadata": {}, "source": [ "### 3. Use the model\n", "- Initialize your model with predefined learning rate of `0.1` and iterations of `100`.\n", "- Fit your model with features and targets.\n", "- Get the prediction with features." ] }, { "cell_type": "code", "execution_count": 177, "id": "af5a590d", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.5768\n", "0.5845333333333333\n", "0.5632\n", "0.6148\n", "0.5904666666666667\n", "0.617\n", "0.5955333333333334\n", "0.5915333333333334\n", "0.6082666666666666\n", "0.5925333333333334\n", "0.6115333333333334\n", "0.5924666666666667\n", "0.6012666666666666\n", "0.5922666666666667\n", "0.6109333333333333\n", "0.5952\n", "0.5922\n", "0.5996666666666667\n", "0.5904666666666667\n", "0.6062\n", "0.5915333333333334\n", "0.5988\n", "0.5916\n", "0.5979333333333333\n", "0.5917333333333333\n", "0.5962\n", "0.5933333333333334\n", "0.5955333333333334\n", "0.5945333333333334\n", "0.5947333333333333\n", "0.5946\n", "0.5946666666666667\n", "0.5947333333333333\n", "0.5946666666666667\n", "0.5946666666666667\n", "0.5946666666666667\n", "0.5946\n", "0.5946\n", "0.5946666666666667\n", "0.5945333333333334\n", "0.5946666666666667\n", "0.5945333333333334\n", "0.5944666666666667\n", "0.5946\n", "0.5945333333333334\n", "0.5946\n", "0.5946\n", "0.5946666666666667\n", "0.5946\n", "0.5945333333333334\n", "0.5946666666666667\n", "0.5946\n", "0.5946\n", "0.5946\n", "0.5946666666666667\n", "0.5945333333333334\n", "0.5946\n", "0.5946666666666667\n", "0.5945333333333334\n", "0.5946666666666667\n", "0.5946666666666667\n", "0.5946\n", "0.5947333333333333\n", "0.5946666666666667\n", "0.5947333333333333\n", "0.5946\n", "0.5947333333333333\n", "0.5946666666666667\n", "0.5946\n", "0.5946666666666667\n", "0.5946\n", "0.5946\n", "0.5946\n", "0.5946\n", "0.5946\n", "0.5946666666666667\n", "0.5947333333333333\n", "0.5946666666666667\n", "0.5947333333333333\n", "0.5948\n", "0.5948\n", "0.5948\n", "0.5948\n", "0.5948666666666667\n", "0.5947333333333333\n", "0.5948\n", "0.5948666666666667\n", "0.5947333333333333\n", "0.5948666666666667\n", "0.5947333333333333\n", "0.5947333333333333\n", "0.5947333333333333\n", "0.5947333333333333\n", "0.5948\n", "0.5946666666666667\n", "0.5948\n", "0.5947333333333333\n", "0.5946666666666667\n", "0.5948\n", "0.5947333333333333\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n", "/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n", " return 1/(1+np.exp(np.negative(z)))\n" ] } ], "source": [ "from sklearn.model_selection import train_test_split\n", "\n", "x_train, x_test, y_train, y_test = train_test_split(df_tcp.drop(columns=[\"class\"], inplace=False), df_tcp[\"class\"])\n", "\n", "lr = LogisticRegression(0.1, 100)\n", "lr.fit(x_train, y_train.values)" ] }, { "cell_type": "code", "execution_count": 174, "id": "beda67a9", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.6056" ] }, "execution_count": 174, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y_predict = np.transpose(lr.predict(x_test))==y_test.values\n", "correct_predictions = np.count_nonzero(y_predict == True)\n", "accuracy = correct_predictions/len(y_test)\n", "\n", "accuracy" ] }, { "cell_type": "code", "execution_count": 175, "id": "8db63dad", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.5965333333333334" ] }, "execution_count": 175, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y_predict = np.transpose(lr.predict(x_train))==y_train.values\n", "correct_predictions = np.count_nonzero(y_predict == True)\n", "accuracy = correct_predictions/len(y_train)\n", "\n", "accuracy" ] }, { "cell_type": "markdown", "id": "bc5ad9e7", "metadata": {}, "source": [ "### 4. Model evaluation\n", "With predicted class labels and ground truths, we now evaluate the model performance through confusion matrix and numerical metrics. Specifically, you need to derive the following:\n", "- Confusion matrix - Note that, you should indicate the corresponding quantity of each element in the table. Here positive is class 1 and negative is class -1:\n", "\\begin{array}{|c|c|c|}\n", "\\hline\n", " & \\textbf{Predicted Positive} & \\textbf{Predicted Negative} \\\\\n", "\\hline\n", "\\textbf{Actual Positive} & \\text{True Positive (TP)} & \\text{False Negative (FN)} \\\\\n", "\\hline\n", "\\textbf{Actual Negative} & \\text{False Positive (FP)} & \\text{True Negative (TN)} \\\\\n", "\\hline\n", "\\end{array}\n", "- Precision of each class and the average value:\n", "$\\frac{\\text{True Positive (TP)}}{\\text{True Positive (TP) + False Positive (FP)}}$\n", "- Recall of each class and the average value:\n", "$\\frac{\\text{True Positive (TP)}}{\\text{True Positive (TP) + False Negative (FN)}}$\n", "- F1-score of each class and the average value:\n", "$F_1 = \\frac{2 \\times \\text{Precision} \\times \\text{Recall}}{\\text{Precision} + \\text{Recall}}$\n", "- Accuracy:\n", "$\\frac{\\text{True Positive (TP) + True Negative (TN)}}{\\text{True Positive (TP) + True Negative (TN) + False Positive (FP) + False Negative (FN)}}$\n", "- Answering the following questions:\n", " - Do you have same performance between classes? If not, which one performs better?\n", " - Change the parameters of learning rate or number of iterations. Do you have same performance? Better or Worse? Why?" ] }, { "cell_type": "code", "execution_count": null, "id": "15b74982", "metadata": {}, "outputs": [], "source": [ "# your answers here" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.2" }, "varInspector": { "cols": { "lenName": 16, "lenType": 16, "lenVar": 40 }, "kernels_config": { "python": { "delete_cmd_postfix": "", "delete_cmd_prefix": "del ", "library": "var_list.py", "varRefreshCmd": "print(var_dic_list())" }, "r": { "delete_cmd_postfix": ") ", "delete_cmd_prefix": "rm(", "library": "var_list.r", "varRefreshCmd": "cat(var_dic_list()) " } }, "types_to_exclude": [ "module", "function", "builtin_function_or_method", "instance", "_Feature" ], "window_display": false } }, "nbformat": 4, "nbformat_minor": 5 }