Files
Esercizi-MLN/Labs/Lab 6/lab_6.ipynb
2024-11-21 16:21:30 +01:00

920 lines
38 KiB
Plaintext
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
{
"cells": [
{
"cell_type": "markdown",
"id": "426a8016",
"metadata": {},
"source": [
"<center><b><font size=6>Lab-6 A classifier from scratch<b><center>"
]
},
{
"cell_type": "markdown",
"id": "a39139f5",
"metadata": {},
"source": [
"### Objective: Implement, use and evaluate a classifier (without using specific libraries such as sklearn)\n",
"1. **Logistic regression** is a binary classification method that maps a linear combination of parameters and variables into two possible classes. Here, you will implement the logistic regression from scratch to better understand how an ML algorithm works. Useful link: <a href=\"https://en.wikipedia.org/wiki/Logistic_regression\">Wiki</a>.\n",
"2. **Performance evaluation metrics** are needed to evaluate the outcome of prediction with respect to true labels. Here, you will implement confusion matrix, accuracy, precision, recall and F-measure. Useful link: <a href=\"https://en.wikipedia.org/wiki/Confusion_matrix\">Wiki</a>."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "b6bf32f9",
"metadata": {},
"outputs": [],
"source": [
"# import needed python libraries\n",
"\n",
"%matplotlib inline\n",
"\n",
"import pandas as pd\n",
"import seaborn as sns\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import random"
]
},
{
"cell_type": "markdown",
"id": "c0959af0",
"metadata": {},
"source": [
"### 1. Dataset - TCP logs\n",
"The dataset contains traffic information generated by an open-source passive network monitoring tool, namely **tstat**. It automates the collection of packet statistics of traffic aggregates, using real-time monitoring features. Being a passive tool, the typical usage scenario is live monitoring of Internet links, in which all transmitted packets are observed. In case of TCP, Tstat identifies a new flow start when it observes a TCP three-way handshake. Similarly, it identifies a TCP flow end either when it sees the TCP connection teardown, or when it doesnt observe packets for some time (idle time). A flow is defined by a unique link between the sender and receiver, e.g., a tuple of <em>(IP_Protocol_Type, IP_Source_Address, Source_Port, IP_Destination_Address, Destination_Port)</em>. For a specific flow, tstat calculates a number of statistics of all the packets transmitted over this flow, and then generate a log for such flow with multiple attributes (statistics). A log file is arranged as a simple table where each column is associated to specific information and each row reports the flow during a connection. The log information is a summary of the flow properties. For instance, in the TCP log we can find columns like the starting time of a TCP connection, its duration, the number of sent and received packets, the observed Round Trip Time.\n",
"![](tstat.png)\n",
"\n",
"In this lab, since the focus is on the development of logistic regression from scratch, we only consider a portion of the dataset for simplicity. The data can be found in `log_tcp_part.csv`, in which there are multiple columns, the last one is the class label, indicating the flow is from either **google** or **youtube**, and the rest are features. Your job is a binary classification task to classify the domain of each flow (row) **from scratch**, including:\n",
"- Build a logistic regression model,\n",
"- Evaluate the performance."
]
},
{
"cell_type": "markdown",
"id": "8fc1d837",
"metadata": {},
"source": [
"1. Load the dataset.\n",
"2. Get the list of features (columns 1 to 10).\n",
"3. Add a new column and assign numerical class labels of -1 and 1 to google and youtube.\n",
"4. Answering the following questions:\n",
" - How many features do we have?\n",
" - How many samples do we have in total?\n",
" - How many samples do we have for each class? Are they similar?"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "70294ef9",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/tmp/ipykernel_226018/230400442.py:3: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`\n",
" df_tcp.replace({\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>c_msgsize_count</th>\n",
" <th>c_pktsize6</th>\n",
" <th>c_msgsize4</th>\n",
" <th>s_msgsize4</th>\n",
" <th>s_pktsize2</th>\n",
" <th>s_rtt_cnt</th>\n",
" <th>s_rtt_std</th>\n",
" <th>s_msgsize5</th>\n",
" <th>c_msgsize6</th>\n",
" <th>c_sit3</th>\n",
" <th>class</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1418</td>\n",
" <td>0</td>\n",
" <td>0.000000</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0.000</td>\n",
" <td>-1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>0.466732</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0.000</td>\n",
" <td>-1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>0.413304</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0.000</td>\n",
" <td>-1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1418</td>\n",
" <td>1</td>\n",
" <td>0.000000</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0.000</td>\n",
" <td>-1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1418</td>\n",
" <td>0</td>\n",
" <td>0.000000</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0.000</td>\n",
" <td>-1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19995</th>\n",
" <td>4</td>\n",
" <td>0</td>\n",
" <td>37</td>\n",
" <td>0</td>\n",
" <td>1418</td>\n",
" <td>3</td>\n",
" <td>22.224528</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>3.334</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19996</th>\n",
" <td>6</td>\n",
" <td>45</td>\n",
" <td>45</td>\n",
" <td>57</td>\n",
" <td>1418</td>\n",
" <td>2</td>\n",
" <td>0.000000</td>\n",
" <td>45</td>\n",
" <td>45</td>\n",
" <td>1.252</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19997</th>\n",
" <td>4</td>\n",
" <td>0</td>\n",
" <td>1205</td>\n",
" <td>0</td>\n",
" <td>531</td>\n",
" <td>4</td>\n",
" <td>15.323660</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>4975.694</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19998</th>\n",
" <td>4</td>\n",
" <td>0</td>\n",
" <td>690</td>\n",
" <td>0</td>\n",
" <td>767</td>\n",
" <td>4</td>\n",
" <td>17.997651</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1719.125</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19999</th>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0.000000</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0.000</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>20000 rows × 11 columns</p>\n",
"</div>"
],
"text/plain": [
" c_msgsize_count c_pktsize6 c_msgsize4 s_msgsize4 s_pktsize2 \\\n",
"0 1 0 0 0 1418 \n",
"1 1 0 0 0 0 \n",
"2 1 0 0 0 0 \n",
"3 1 0 0 0 1418 \n",
"4 1 0 0 0 1418 \n",
"... ... ... ... ... ... \n",
"19995 4 0 37 0 1418 \n",
"19996 6 45 45 57 1418 \n",
"19997 4 0 1205 0 531 \n",
"19998 4 0 690 0 767 \n",
"19999 1 0 0 0 0 \n",
"\n",
" s_rtt_cnt s_rtt_std s_msgsize5 c_msgsize6 c_sit3 class \n",
"0 0 0.000000 0 0 0.000 -1 \n",
"1 3 0.466732 0 0 0.000 -1 \n",
"2 3 0.413304 0 0 0.000 -1 \n",
"3 1 0.000000 0 0 0.000 -1 \n",
"4 0 0.000000 0 0 0.000 -1 \n",
"... ... ... ... ... ... ... \n",
"19995 3 22.224528 0 0 3.334 1 \n",
"19996 2 0.000000 45 45 1.252 1 \n",
"19997 4 15.323660 0 0 4975.694 1 \n",
"19998 4 17.997651 0 0 1719.125 1 \n",
"19999 1 0.000000 0 0 0.000 1 \n",
"\n",
"[20000 rows x 11 columns]"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_tcp = pd.read_csv('log_tcp_part.csv')\n",
"features = df_tcp.columns[:-1] # Remove class\n",
"df_tcp.replace({\n",
" \"class\": {\n",
" \"google\": -1,\n",
" \"youtube\": 1,\n",
" }\n",
"}, inplace=True)\n",
"\n",
"df_tcp"
]
},
{
"cell_type": "code",
"execution_count": 41,
"id": "48d85d94",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Number of features: 10\n",
"Number of samples: 20000\n",
"Number of samples of google: 10000\n",
"Number of samples of youtube: 10000\n"
]
}
],
"source": [
"num_features = features.size\n",
"num_samples = len(df_tcp)\n",
"num_google = len(df_tcp.loc[df_tcp[\"class\"] == -1])\n",
"num_youtube = len(df_tcp.loc[df_tcp[\"class\"] == 1])\n",
"\n",
"print(f\"Number of features: {num_features}\")\n",
"print(f\"Number of samples: {num_samples}\")\n",
"print(f\"Number of samples of google: {num_google}\")\n",
"print(f\"Number of samples of youtube: {num_youtube}\")"
]
},
{
"cell_type": "markdown",
"id": "c1c8cc80",
"metadata": {},
"source": [
"### 2. Implement your logistic regression learning algorithm\n",
"Here you will need to construct a class in which you need to define two functions besides the class initialization:\n",
"- `fit`. In this method you will perform ERM. Learn the parameters of the model (i.e., the hypothesis h) from training with gradient descent\n",
"- `predict`. In this method given one sample x (or more) you will perform the inference $sign(h(x))$ to obtain class labels.\n",
"\n",
"Hints:\n",
"\n",
"- The linear function used in the logistic regression is the following: $h(x)=w^T x +b $, where b is a scalar bias.\n",
"- Logistic loss: $L((x,y),h)=\\log(1+e^{-y h(x)})$\n",
"- ERM: $\\min_{w,b} f(w,b)=\\frac{1}{m}\\sum_{i=1}^{m} \\log(1+e^{-y^{(i)} h(x^{(i)})})$\n",
"- Gradient for weight: $\\nabla_w f(w,b) = \\frac{1}{m} \\sum_i \\frac{-y^{(i)}x^{(i)}}{(1+e^{y^{(i)}h(x^{(i)})})}$\n",
"- Gradient for bias: $\\nabla_b f(w,b)= \\frac{1}{m} \\sum_i \\frac{-y^{(i)}}{(1+e^{y^{(i)}h(x^{(i)})})}$\n",
"- Update the parameters: $w \\leftarrow w - \\alpha \\nabla w$, $b \\leftarrow b - \\alpha \\nabla b$\n",
"\n",
"Notice that the sigmoid function $f(z) = \\frac{1}{1 + e^{-z}}$ appears multiple times. You can write also a method for the sigmoid function to help you in the computation. By considering f(z), the gradients rewrite as:\n",
"\n",
"- Gradient for weight: $\\nabla_w f(w,b) = \\frac{1}{m} \\sum_i ({f(h(x^{(i)})) - y^{(i)}})x^{(i)}$\n",
"- Gradient for bias: $\\nabla_b f(w,b) = \\frac{1}{m} \\sum_i ({f(h(x^{(i)})) - y^{(i)}})$"
]
},
{
"cell_type": "code",
"execution_count": 176,
"id": "90a02f52",
"metadata": {},
"outputs": [],
"source": [
"def sigmoid(z):\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"\n",
"class LogisticRegression:\n",
" def __init__(self, learning_rate, num_iterations):\n",
" self.learning_rate = learning_rate\n",
" self.num_iterations = num_iterations\n",
"\n",
"\n",
" def h(self, X):\n",
" return np.dot(X, self.w) + self.b\n",
" \n",
" \n",
" def gradient_step_w(self, m, X, y):\n",
" h = self.h(X)\n",
" f = sigmoid(h)\n",
" s = np.dot(X.T, np.subtract(f, y))\n",
"\n",
" return s/m\n",
" \n",
"\n",
" def gradient_step_b(self, m, X, y):\n",
" h = self.h(X)\n",
" f = sigmoid(h)\n",
" s = np.subtract(f, y).sum()\n",
" \n",
" return s/m\n",
"\n",
"\n",
" def fit(self, X, y):\n",
" self.w = np.zeros((X.shape[1]))\n",
" self.b = 0\n",
" m = len(X)\n",
" \n",
" for i in range(self.num_iterations):\n",
" w_step = self.gradient_step_w(m, X, y)\n",
" b_step = self.gradient_step_b(m, X, y)\n",
"\n",
" self.w -= self.learning_rate*w_step\n",
" self.b -= self.learning_rate*b_step\n",
"\n",
" y_predict = np.transpose(self.predict(X))==y\n",
" correct_predictions = np.count_nonzero(y_predict == True)\n",
" accuracy = correct_predictions/len(y)\n",
" print(accuracy)\n",
" \n",
"\n",
" def predict(self, X):\n",
" if self.w is None or self.b is None:\n",
" raise ValueError\n",
" \n",
" p = self.h(X)\n",
" return np.sign(p)"
]
},
{
"cell_type": "markdown",
"id": "cc478b78",
"metadata": {},
"source": [
"### 3. Use the model\n",
"- Initialize your model with predefined learning rate of `0.1` and iterations of `100`.\n",
"- Fit your model with features and targets.\n",
"- Get the prediction with features."
]
},
{
"cell_type": "code",
"execution_count": 177,
"id": "af5a590d",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0.5768\n",
"0.5845333333333333\n",
"0.5632\n",
"0.6148\n",
"0.5904666666666667\n",
"0.617\n",
"0.5955333333333334\n",
"0.5915333333333334\n",
"0.6082666666666666\n",
"0.5925333333333334\n",
"0.6115333333333334\n",
"0.5924666666666667\n",
"0.6012666666666666\n",
"0.5922666666666667\n",
"0.6109333333333333\n",
"0.5952\n",
"0.5922\n",
"0.5996666666666667\n",
"0.5904666666666667\n",
"0.6062\n",
"0.5915333333333334\n",
"0.5988\n",
"0.5916\n",
"0.5979333333333333\n",
"0.5917333333333333\n",
"0.5962\n",
"0.5933333333333334\n",
"0.5955333333333334\n",
"0.5945333333333334\n",
"0.5947333333333333\n",
"0.5946\n",
"0.5946666666666667\n",
"0.5947333333333333\n",
"0.5946666666666667\n",
"0.5946666666666667\n",
"0.5946666666666667\n",
"0.5946\n",
"0.5946\n",
"0.5946666666666667\n",
"0.5945333333333334\n",
"0.5946666666666667\n",
"0.5945333333333334\n",
"0.5944666666666667\n",
"0.5946\n",
"0.5945333333333334\n",
"0.5946\n",
"0.5946\n",
"0.5946666666666667\n",
"0.5946\n",
"0.5945333333333334\n",
"0.5946666666666667\n",
"0.5946\n",
"0.5946\n",
"0.5946\n",
"0.5946666666666667\n",
"0.5945333333333334\n",
"0.5946\n",
"0.5946666666666667\n",
"0.5945333333333334\n",
"0.5946666666666667\n",
"0.5946666666666667\n",
"0.5946\n",
"0.5947333333333333\n",
"0.5946666666666667\n",
"0.5947333333333333\n",
"0.5946\n",
"0.5947333333333333\n",
"0.5946666666666667\n",
"0.5946\n",
"0.5946666666666667\n",
"0.5946\n",
"0.5946\n",
"0.5946\n",
"0.5946\n",
"0.5946\n",
"0.5946666666666667\n",
"0.5947333333333333\n",
"0.5946666666666667\n",
"0.5947333333333333\n",
"0.5948\n",
"0.5948\n",
"0.5948\n",
"0.5948\n",
"0.5948666666666667\n",
"0.5947333333333333\n",
"0.5948\n",
"0.5948666666666667\n",
"0.5947333333333333\n",
"0.5948666666666667\n",
"0.5947333333333333\n",
"0.5947333333333333\n",
"0.5947333333333333\n",
"0.5947333333333333\n",
"0.5948\n",
"0.5946666666666667\n",
"0.5948\n",
"0.5947333333333333\n",
"0.5946666666666667\n",
"0.5948\n",
"0.5947333333333333\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n"
]
}
],
"source": [
"from sklearn.model_selection import train_test_split\n",
"\n",
"x_train, x_test, y_train, y_test = train_test_split(df_tcp.drop(columns=[\"class\"], inplace=False), df_tcp[\"class\"])\n",
"\n",
"lr = LogisticRegression(0.1, 100)\n",
"lr.fit(x_train, y_train.values)"
]
},
{
"cell_type": "code",
"execution_count": 174,
"id": "beda67a9",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.6056"
]
},
"execution_count": 174,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"y_predict = np.transpose(lr.predict(x_test))==y_test.values\n",
"correct_predictions = np.count_nonzero(y_predict == True)\n",
"accuracy = correct_predictions/len(y_test)\n",
"\n",
"accuracy"
]
},
{
"cell_type": "code",
"execution_count": 175,
"id": "8db63dad",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.5965333333333334"
]
},
"execution_count": 175,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"y_predict = np.transpose(lr.predict(x_train))==y_train.values\n",
"correct_predictions = np.count_nonzero(y_predict == True)\n",
"accuracy = correct_predictions/len(y_train)\n",
"\n",
"accuracy"
]
},
{
"cell_type": "markdown",
"id": "bc5ad9e7",
"metadata": {},
"source": [
"### 4. Model evaluation\n",
"With predicted class labels and ground truths, we now evaluate the model performance through confusion matrix and numerical metrics. Specifically, you need to derive the following:\n",
"- Confusion matrix - Note that, you should indicate the corresponding quantity of each element in the table. Here positive is class 1 and negative is class -1:\n",
"\\begin{array}{|c|c|c|}\n",
"\\hline\n",
" & \\textbf{Predicted Positive} & \\textbf{Predicted Negative} \\\\\n",
"\\hline\n",
"\\textbf{Actual Positive} & \\text{True Positive (TP)} & \\text{False Negative (FN)} \\\\\n",
"\\hline\n",
"\\textbf{Actual Negative} & \\text{False Positive (FP)} & \\text{True Negative (TN)} \\\\\n",
"\\hline\n",
"\\end{array}\n",
"- Precision of each class and the average value:\n",
"$\\frac{\\text{True Positive (TP)}}{\\text{True Positive (TP) + False Positive (FP)}}$\n",
"- Recall of each class and the average value:\n",
"$\\frac{\\text{True Positive (TP)}}{\\text{True Positive (TP) + False Negative (FN)}}$\n",
"- F1-score of each class and the average value:\n",
"$F_1 = \\frac{2 \\times \\text{Precision} \\times \\text{Recall}}{\\text{Precision} + \\text{Recall}}$\n",
"- Accuracy:\n",
"$\\frac{\\text{True Positive (TP) + True Negative (TN)}}{\\text{True Positive (TP) + True Negative (TN) + False Positive (FP) + False Negative (FN)}}$\n",
"- Answering the following questions:\n",
" - Do you have same performance between classes? If not, which one performs better?\n",
" - Change the parameters of learning rate or number of iterations. Do you have same performance? Better or Worse? Why?"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "15b74982",
"metadata": {},
"outputs": [],
"source": [
"# your answers here"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.2"
},
"varInspector": {
"cols": {
"lenName": 16,
"lenType": 16,
"lenVar": 40
},
"kernels_config": {
"python": {
"delete_cmd_postfix": "",
"delete_cmd_prefix": "del ",
"library": "var_list.py",
"varRefreshCmd": "print(var_dic_list())"
},
"r": {
"delete_cmd_postfix": ") ",
"delete_cmd_prefix": "rm(",
"library": "var_list.r",
"varRefreshCmd": "cat(var_dic_list()) "
}
},
"types_to_exclude": [
"module",
"function",
"builtin_function_or_method",
"instance",
"_Feature"
],
"window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 5
}