Compare commits

..

3 Commits

Author SHA1 Message Date
d5b768962c labs: Add eight lab 2024-11-28 23:01:35 +01:00
25846ac643 labs: Add seventh base code 2024-11-21 17:52:19 +01:00
ae84532d96 labs: Add partial sixth lab 2024-11-21 16:21:30 +01:00
10 changed files with 304490 additions and 0 deletions

919
Labs/Lab 6/lab_6.ipynb Normal file
View File

@@ -0,0 +1,919 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "426a8016",
"metadata": {},
"source": [
"<center><b><font size=6>Lab-6 A classifier from scratch<b><center>"
]
},
{
"cell_type": "markdown",
"id": "a39139f5",
"metadata": {},
"source": [
"### Objective: Implement, use and evaluate a classifier (without using specific libraries such as sklearn)\n",
"1. **Logistic regression** is a binary classification method that maps a linear combination of parameters and variables into two possible classes. Here, you will implement the logistic regression from scratch to better understand how an ML algorithm works. Useful link: <a href=\"https://en.wikipedia.org/wiki/Logistic_regression\">Wiki</a>.\n",
"2. **Performance evaluation metrics** are needed to evaluate the outcome of prediction with respect to true labels. Here, you will implement confusion matrix, accuracy, precision, recall and F-measure. Useful link: <a href=\"https://en.wikipedia.org/wiki/Confusion_matrix\">Wiki</a>."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "b6bf32f9",
"metadata": {},
"outputs": [],
"source": [
"# import needed python libraries\n",
"\n",
"%matplotlib inline\n",
"\n",
"import pandas as pd\n",
"import seaborn as sns\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import random"
]
},
{
"cell_type": "markdown",
"id": "c0959af0",
"metadata": {},
"source": [
"### 1. Dataset - TCP logs\n",
"The dataset contains traffic information generated by an open-source passive network monitoring tool, namely **tstat**. It automates the collection of packet statistics of traffic aggregates, using real-time monitoring features. Being a passive tool, the typical usage scenario is live monitoring of Internet links, in which all transmitted packets are observed. In case of TCP, Tstat identifies a new flow start when it observes a TCP three-way handshake. Similarly, it identifies a TCP flow end either when it sees the TCP connection teardown, or when it doesnt observe packets for some time (idle time). A flow is defined by a unique link between the sender and receiver, e.g., a tuple of <em>(IP_Protocol_Type, IP_Source_Address, Source_Port, IP_Destination_Address, Destination_Port)</em>. For a specific flow, tstat calculates a number of statistics of all the packets transmitted over this flow, and then generate a log for such flow with multiple attributes (statistics). A log file is arranged as a simple table where each column is associated to specific information and each row reports the flow during a connection. The log information is a summary of the flow properties. For instance, in the TCP log we can find columns like the starting time of a TCP connection, its duration, the number of sent and received packets, the observed Round Trip Time.\n",
"![](tstat.png)\n",
"\n",
"In this lab, since the focus is on the development of logistic regression from scratch, we only consider a portion of the dataset for simplicity. The data can be found in `log_tcp_part.csv`, in which there are multiple columns, the last one is the class label, indicating the flow is from either **google** or **youtube**, and the rest are features. Your job is a binary classification task to classify the domain of each flow (row) **from scratch**, including:\n",
"- Build a logistic regression model,\n",
"- Evaluate the performance."
]
},
{
"cell_type": "markdown",
"id": "8fc1d837",
"metadata": {},
"source": [
"1. Load the dataset.\n",
"2. Get the list of features (columns 1 to 10).\n",
"3. Add a new column and assign numerical class labels of -1 and 1 to google and youtube.\n",
"4. Answering the following questions:\n",
" - How many features do we have?\n",
" - How many samples do we have in total?\n",
" - How many samples do we have for each class? Are they similar?"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "70294ef9",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/tmp/ipykernel_226018/230400442.py:3: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`\n",
" df_tcp.replace({\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>c_msgsize_count</th>\n",
" <th>c_pktsize6</th>\n",
" <th>c_msgsize4</th>\n",
" <th>s_msgsize4</th>\n",
" <th>s_pktsize2</th>\n",
" <th>s_rtt_cnt</th>\n",
" <th>s_rtt_std</th>\n",
" <th>s_msgsize5</th>\n",
" <th>c_msgsize6</th>\n",
" <th>c_sit3</th>\n",
" <th>class</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1418</td>\n",
" <td>0</td>\n",
" <td>0.000000</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0.000</td>\n",
" <td>-1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>0.466732</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0.000</td>\n",
" <td>-1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>0.413304</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0.000</td>\n",
" <td>-1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1418</td>\n",
" <td>1</td>\n",
" <td>0.000000</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0.000</td>\n",
" <td>-1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1418</td>\n",
" <td>0</td>\n",
" <td>0.000000</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0.000</td>\n",
" <td>-1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19995</th>\n",
" <td>4</td>\n",
" <td>0</td>\n",
" <td>37</td>\n",
" <td>0</td>\n",
" <td>1418</td>\n",
" <td>3</td>\n",
" <td>22.224528</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>3.334</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19996</th>\n",
" <td>6</td>\n",
" <td>45</td>\n",
" <td>45</td>\n",
" <td>57</td>\n",
" <td>1418</td>\n",
" <td>2</td>\n",
" <td>0.000000</td>\n",
" <td>45</td>\n",
" <td>45</td>\n",
" <td>1.252</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19997</th>\n",
" <td>4</td>\n",
" <td>0</td>\n",
" <td>1205</td>\n",
" <td>0</td>\n",
" <td>531</td>\n",
" <td>4</td>\n",
" <td>15.323660</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>4975.694</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19998</th>\n",
" <td>4</td>\n",
" <td>0</td>\n",
" <td>690</td>\n",
" <td>0</td>\n",
" <td>767</td>\n",
" <td>4</td>\n",
" <td>17.997651</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1719.125</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19999</th>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0.000000</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0.000</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>20000 rows × 11 columns</p>\n",
"</div>"
],
"text/plain": [
" c_msgsize_count c_pktsize6 c_msgsize4 s_msgsize4 s_pktsize2 \\\n",
"0 1 0 0 0 1418 \n",
"1 1 0 0 0 0 \n",
"2 1 0 0 0 0 \n",
"3 1 0 0 0 1418 \n",
"4 1 0 0 0 1418 \n",
"... ... ... ... ... ... \n",
"19995 4 0 37 0 1418 \n",
"19996 6 45 45 57 1418 \n",
"19997 4 0 1205 0 531 \n",
"19998 4 0 690 0 767 \n",
"19999 1 0 0 0 0 \n",
"\n",
" s_rtt_cnt s_rtt_std s_msgsize5 c_msgsize6 c_sit3 class \n",
"0 0 0.000000 0 0 0.000 -1 \n",
"1 3 0.466732 0 0 0.000 -1 \n",
"2 3 0.413304 0 0 0.000 -1 \n",
"3 1 0.000000 0 0 0.000 -1 \n",
"4 0 0.000000 0 0 0.000 -1 \n",
"... ... ... ... ... ... ... \n",
"19995 3 22.224528 0 0 3.334 1 \n",
"19996 2 0.000000 45 45 1.252 1 \n",
"19997 4 15.323660 0 0 4975.694 1 \n",
"19998 4 17.997651 0 0 1719.125 1 \n",
"19999 1 0.000000 0 0 0.000 1 \n",
"\n",
"[20000 rows x 11 columns]"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_tcp = pd.read_csv('log_tcp_part.csv')\n",
"features = df_tcp.columns[:-1] # Remove class\n",
"df_tcp.replace({\n",
" \"class\": {\n",
" \"google\": -1,\n",
" \"youtube\": 1,\n",
" }\n",
"}, inplace=True)\n",
"\n",
"df_tcp"
]
},
{
"cell_type": "code",
"execution_count": 41,
"id": "48d85d94",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Number of features: 10\n",
"Number of samples: 20000\n",
"Number of samples of google: 10000\n",
"Number of samples of youtube: 10000\n"
]
}
],
"source": [
"num_features = features.size\n",
"num_samples = len(df_tcp)\n",
"num_google = len(df_tcp.loc[df_tcp[\"class\"] == -1])\n",
"num_youtube = len(df_tcp.loc[df_tcp[\"class\"] == 1])\n",
"\n",
"print(f\"Number of features: {num_features}\")\n",
"print(f\"Number of samples: {num_samples}\")\n",
"print(f\"Number of samples of google: {num_google}\")\n",
"print(f\"Number of samples of youtube: {num_youtube}\")"
]
},
{
"cell_type": "markdown",
"id": "c1c8cc80",
"metadata": {},
"source": [
"### 2. Implement your logistic regression learning algorithm\n",
"Here you will need to construct a class in which you need to define two functions besides the class initialization:\n",
"- `fit`. In this method you will perform ERM. Learn the parameters of the model (i.e., the hypothesis h) from training with gradient descent\n",
"- `predict`. In this method given one sample x (or more) you will perform the inference $sign(h(x))$ to obtain class labels.\n",
"\n",
"Hints:\n",
"\n",
"- The linear function used in the logistic regression is the following: $h(x)=w^T x +b $, where b is a scalar bias.\n",
"- Logistic loss: $L((x,y),h)=\\log(1+e^{-y h(x)})$\n",
"- ERM: $\\min_{w,b} f(w,b)=\\frac{1}{m}\\sum_{i=1}^{m} \\log(1+e^{-y^{(i)} h(x^{(i)})})$\n",
"- Gradient for weight: $\\nabla_w f(w,b) = \\frac{1}{m} \\sum_i \\frac{-y^{(i)}x^{(i)}}{(1+e^{y^{(i)}h(x^{(i)})})}$\n",
"- Gradient for bias: $\\nabla_b f(w,b)= \\frac{1}{m} \\sum_i \\frac{-y^{(i)}}{(1+e^{y^{(i)}h(x^{(i)})})}$\n",
"- Update the parameters: $w \\leftarrow w - \\alpha \\nabla w$, $b \\leftarrow b - \\alpha \\nabla b$\n",
"\n",
"Notice that the sigmoid function $f(z) = \\frac{1}{1 + e^{-z}}$ appears multiple times. You can write also a method for the sigmoid function to help you in the computation. By considering f(z), the gradients rewrite as:\n",
"\n",
"- Gradient for weight: $\\nabla_w f(w,b) = \\frac{1}{m} \\sum_i ({f(h(x^{(i)})) - y^{(i)}})x^{(i)}$\n",
"- Gradient for bias: $\\nabla_b f(w,b) = \\frac{1}{m} \\sum_i ({f(h(x^{(i)})) - y^{(i)}})$"
]
},
{
"cell_type": "code",
"execution_count": 176,
"id": "90a02f52",
"metadata": {},
"outputs": [],
"source": [
"def sigmoid(z):\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"\n",
"class LogisticRegression:\n",
" def __init__(self, learning_rate, num_iterations):\n",
" self.learning_rate = learning_rate\n",
" self.num_iterations = num_iterations\n",
"\n",
"\n",
" def h(self, X):\n",
" return np.dot(X, self.w) + self.b\n",
" \n",
" \n",
" def gradient_step_w(self, m, X, y):\n",
" h = self.h(X)\n",
" f = sigmoid(h)\n",
" s = np.dot(X.T, np.subtract(f, y))\n",
"\n",
" return s/m\n",
" \n",
"\n",
" def gradient_step_b(self, m, X, y):\n",
" h = self.h(X)\n",
" f = sigmoid(h)\n",
" s = np.subtract(f, y).sum()\n",
" \n",
" return s/m\n",
"\n",
"\n",
" def fit(self, X, y):\n",
" self.w = np.zeros((X.shape[1]))\n",
" self.b = 0\n",
" m = len(X)\n",
" \n",
" for i in range(self.num_iterations):\n",
" w_step = self.gradient_step_w(m, X, y)\n",
" b_step = self.gradient_step_b(m, X, y)\n",
"\n",
" self.w -= self.learning_rate*w_step\n",
" self.b -= self.learning_rate*b_step\n",
"\n",
" y_predict = np.transpose(self.predict(X))==y\n",
" correct_predictions = np.count_nonzero(y_predict == True)\n",
" accuracy = correct_predictions/len(y)\n",
" print(accuracy)\n",
" \n",
"\n",
" def predict(self, X):\n",
" if self.w is None or self.b is None:\n",
" raise ValueError\n",
" \n",
" p = self.h(X)\n",
" return np.sign(p)"
]
},
{
"cell_type": "markdown",
"id": "cc478b78",
"metadata": {},
"source": [
"### 3. Use the model\n",
"- Initialize your model with predefined learning rate of `0.1` and iterations of `100`.\n",
"- Fit your model with features and targets.\n",
"- Get the prediction with features."
]
},
{
"cell_type": "code",
"execution_count": 177,
"id": "af5a590d",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0.5768\n",
"0.5845333333333333\n",
"0.5632\n",
"0.6148\n",
"0.5904666666666667\n",
"0.617\n",
"0.5955333333333334\n",
"0.5915333333333334\n",
"0.6082666666666666\n",
"0.5925333333333334\n",
"0.6115333333333334\n",
"0.5924666666666667\n",
"0.6012666666666666\n",
"0.5922666666666667\n",
"0.6109333333333333\n",
"0.5952\n",
"0.5922\n",
"0.5996666666666667\n",
"0.5904666666666667\n",
"0.6062\n",
"0.5915333333333334\n",
"0.5988\n",
"0.5916\n",
"0.5979333333333333\n",
"0.5917333333333333\n",
"0.5962\n",
"0.5933333333333334\n",
"0.5955333333333334\n",
"0.5945333333333334\n",
"0.5947333333333333\n",
"0.5946\n",
"0.5946666666666667\n",
"0.5947333333333333\n",
"0.5946666666666667\n",
"0.5946666666666667\n",
"0.5946666666666667\n",
"0.5946\n",
"0.5946\n",
"0.5946666666666667\n",
"0.5945333333333334\n",
"0.5946666666666667\n",
"0.5945333333333334\n",
"0.5944666666666667\n",
"0.5946\n",
"0.5945333333333334\n",
"0.5946\n",
"0.5946\n",
"0.5946666666666667\n",
"0.5946\n",
"0.5945333333333334\n",
"0.5946666666666667\n",
"0.5946\n",
"0.5946\n",
"0.5946\n",
"0.5946666666666667\n",
"0.5945333333333334\n",
"0.5946\n",
"0.5946666666666667\n",
"0.5945333333333334\n",
"0.5946666666666667\n",
"0.5946666666666667\n",
"0.5946\n",
"0.5947333333333333\n",
"0.5946666666666667\n",
"0.5947333333333333\n",
"0.5946\n",
"0.5947333333333333\n",
"0.5946666666666667\n",
"0.5946\n",
"0.5946666666666667\n",
"0.5946\n",
"0.5946\n",
"0.5946\n",
"0.5946\n",
"0.5946\n",
"0.5946666666666667\n",
"0.5947333333333333\n",
"0.5946666666666667\n",
"0.5947333333333333\n",
"0.5948\n",
"0.5948\n",
"0.5948\n",
"0.5948\n",
"0.5948666666666667\n",
"0.5947333333333333\n",
"0.5948\n",
"0.5948666666666667\n",
"0.5947333333333333\n",
"0.5948666666666667\n",
"0.5947333333333333\n",
"0.5947333333333333\n",
"0.5947333333333333\n",
"0.5947333333333333\n",
"0.5948\n",
"0.5946666666666667\n",
"0.5948\n",
"0.5947333333333333\n",
"0.5946666666666667\n",
"0.5948\n",
"0.5947333333333333\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n",
"/tmp/ipykernel_226018/1018497028.py:2: RuntimeWarning: overflow encountered in exp\n",
" return 1/(1+np.exp(np.negative(z)))\n"
]
}
],
"source": [
"from sklearn.model_selection import train_test_split\n",
"\n",
"x_train, x_test, y_train, y_test = train_test_split(df_tcp.drop(columns=[\"class\"], inplace=False), df_tcp[\"class\"])\n",
"\n",
"lr = LogisticRegression(0.1, 100)\n",
"lr.fit(x_train, y_train.values)"
]
},
{
"cell_type": "code",
"execution_count": 174,
"id": "beda67a9",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.6056"
]
},
"execution_count": 174,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"y_predict = np.transpose(lr.predict(x_test))==y_test.values\n",
"correct_predictions = np.count_nonzero(y_predict == True)\n",
"accuracy = correct_predictions/len(y_test)\n",
"\n",
"accuracy"
]
},
{
"cell_type": "code",
"execution_count": 175,
"id": "8db63dad",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.5965333333333334"
]
},
"execution_count": 175,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"y_predict = np.transpose(lr.predict(x_train))==y_train.values\n",
"correct_predictions = np.count_nonzero(y_predict == True)\n",
"accuracy = correct_predictions/len(y_train)\n",
"\n",
"accuracy"
]
},
{
"cell_type": "markdown",
"id": "bc5ad9e7",
"metadata": {},
"source": [
"### 4. Model evaluation\n",
"With predicted class labels and ground truths, we now evaluate the model performance through confusion matrix and numerical metrics. Specifically, you need to derive the following:\n",
"- Confusion matrix - Note that, you should indicate the corresponding quantity of each element in the table. Here positive is class 1 and negative is class -1:\n",
"\\begin{array}{|c|c|c|}\n",
"\\hline\n",
" & \\textbf{Predicted Positive} & \\textbf{Predicted Negative} \\\\\n",
"\\hline\n",
"\\textbf{Actual Positive} & \\text{True Positive (TP)} & \\text{False Negative (FN)} \\\\\n",
"\\hline\n",
"\\textbf{Actual Negative} & \\text{False Positive (FP)} & \\text{True Negative (TN)} \\\\\n",
"\\hline\n",
"\\end{array}\n",
"- Precision of each class and the average value:\n",
"$\\frac{\\text{True Positive (TP)}}{\\text{True Positive (TP) + False Positive (FP)}}$\n",
"- Recall of each class and the average value:\n",
"$\\frac{\\text{True Positive (TP)}}{\\text{True Positive (TP) + False Negative (FN)}}$\n",
"- F1-score of each class and the average value:\n",
"$F_1 = \\frac{2 \\times \\text{Precision} \\times \\text{Recall}}{\\text{Precision} + \\text{Recall}}$\n",
"- Accuracy:\n",
"$\\frac{\\text{True Positive (TP) + True Negative (TN)}}{\\text{True Positive (TP) + True Negative (TN) + False Positive (FP) + False Negative (FN)}}$\n",
"- Answering the following questions:\n",
" - Do you have same performance between classes? If not, which one performs better?\n",
" - Change the parameters of learning rate or number of iterations. Do you have same performance? Better or Worse? Why?"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "15b74982",
"metadata": {},
"outputs": [],
"source": [
"# your answers here"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.2"
},
"varInspector": {
"cols": {
"lenName": 16,
"lenType": 16,
"lenVar": 40
},
"kernels_config": {
"python": {
"delete_cmd_postfix": "",
"delete_cmd_prefix": "del ",
"library": "var_list.py",
"varRefreshCmd": "print(var_dic_list())"
},
"r": {
"delete_cmd_postfix": ") ",
"delete_cmd_prefix": "rm(",
"library": "var_list.r",
"varRefreshCmd": "cat(var_dic_list()) "
}
},
"types_to_exclude": [
"module",
"function",
"builtin_function_or_method",
"instance",
"_Feature"
],
"window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 5
}

20001
Labs/Lab 6/log_tcp_part.csv Normal file

File diff suppressed because it is too large Load Diff

BIN
Labs/Lab 6/tstat.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 132 KiB

140001
Labs/Lab 7/RTP_dataset.csv Normal file

File diff suppressed because it is too large Load Diff

2265
Labs/Lab 7/lab_7.ipynb Normal file

File diff suppressed because one or more lines are too long

Binary file not shown.

After

Width:  |  Height:  |  Size: 130 KiB

140001
Labs/Lab 8/RTP_dataset.csv Normal file

File diff suppressed because it is too large Load Diff

Binary file not shown.

1303
Labs/Lab 8/lab_8.ipynb Normal file

File diff suppressed because one or more lines are too long

BIN
Labs/Lab 8/validation.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 101 KiB