Post

Pre requisite of Linear regression in Machine Learning- Python and Linear Algebra

In continuation with my previous topic- matrix , here system of linear equations can be solved using Python. Let us see-

Example 1: Solving two linear equations with two variables

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
import numpy as np
import scipy

# let, one equation be x-y=2
# other equation be   2x+y=4
# in matrix notation - the compact form will be AX=B; X=transpose of [x,y] ; B= transpose of [2,4]

A = np.array([[1, -1], [2, 1]])  # two linear equations with two variables, so 2 X 2 matrix created
B = np.array([2,4])
B_T=B.T    # transpose of B (as it should be a column matrix)

det_A = np.linalg.det(A)
if (det_A==0):
    print("No solution")
else:
    A_inv = scipy.linalg.inv(A)    # inverse of A
    D =A_inv @ B_T     # dot product - similar syntax is np.dot(A_inv, C)
    print (f' The solution vector is {D}')
    x,y=D[0],D[1]
    print(f' The solution x={x}, y={y}')
1
2
 The solution vector is [2. 0.]
 The solution x=2.0, y=0.0

Example 2: Solving three linear equations with three variables

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
import numpy as np
import scipy

# let, First equation is 2x-y+z=12
# Second equation is    2x+y-z=4
# Third equation is     x+2y-z=10
# in matrix notation - the compact form will be AX=B; X=transpose of [x,y,z] ; B= transpose of [12,4,10]

A = np.array([[2, -1, 1], [2, 1, -1],[1, 2, -1]])  # three linear equations with threeo variables, so 3 X 3 matrix created
B = np.array([12,4,10])
B_T=B.T    # transpose of B (as it should be a column matrix)

det_A = np.linalg.det(A)
if (det_A==0):
    print("No solution")
else:
    A_inv = scipy.linalg.inv(A)    # inverse of A
    D =A_inv @ B_T     # dot product - similar syntax is np.dot(A_inv, C)
    print (f' The solution vector is {D}')
    
        # Ensure rounding of all values in D (x, y, z)
    D_rounded = np.round(D, 4)
    
    # Extract values of x, y, z from the rounded array
    x, y, z = D_rounded[0], D_rounded[1], D_rounded[2]
    
    # or we may use simply x, y, z = D_rounded
    
    # Print the rounded values for x, y, z
    print(f' The solution x={x}, y={y}, z={z}')
1
2
 The solution vector is [ 4. 10. 14.]
 The solution x=4.0, y=10.0, z=14.0

Now optimizing the code ; np.linalg.solve() avoids computing the inverse, which can be computationally expensive and numerically unstable. So let us use np.linalg.solve()- Numerically stable method (does not invert the matrix).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
import numpy as np

# let, First equation is 2x-y+z=12
# Second equation is    2x+y-z=4
# Third equation is     x+2y-z=10
# in matrix notation - the compact form will be AX=B; X=transpose of [x,y,z] ; B= transpose of [12,4,10]


# Define the coefficient matrix A and constant vector B
A = np.array([
    [2, -1,  1],
    [2,  1, -1],
    [1,  2, -1]
])

B = np.array([12, 4, 10])

try:
    # Solve AX = B using NumPy's linear solver
    solution = np.linalg.solve(A, B)

    # Round the solution to 4 decimal places for display
    solution_rounded = np.round(solution, 4)

    # Extract rounded values
    x, y, z = solution_rounded

    # Display the clean, formatted output
    print(f"The solution is: x = {x}, y = {y}, z = {z}")

except np.linalg.LinAlgError:
    print("No unique solution exists (the matrix is singular).")
1
The solution is: x = 4.0, y = 10.0, z = 14.0

Example 3: Lat us take an example , where the det(A)=0

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
import numpy as np

# let, First equation is 2x-y+z=12
# Second equation is     2x+y-z=4
# Third equation is      2x-y+z=12
# in matrix notation - the compact form will be AX=B; X=transpose of [x,y,z] ; B= transpose of [12,4,10]


# Define the coefficient matrix A and constant vector B
A = np.array([
    [2, -1,  1],
    [2,  1, -1],
    [2, -1,  1]
])

B = np.array([12, 4, 10])

try:
    # Solve AX = B using NumPy's linear solver
    solution = np.linalg.solve(A, B)

    # Round the solution to 4 decimal places for display
    solution_rounded = np.round(solution, 4)

    # Extract rounded values
    x, y, z = solution_rounded

    # Display the clean, formatted output
    print(f"The solution is: x = {x}, y = {y}, z = {z}")

except np.linalg.LinAlgError:
    print("No unique solution exists (the matrix A is singular).")
1
No unique solution exists (the matrix A is singular).

Now, Let us construct/formulate a problem -

Generation of random (x, y) pairs where y = f(x) + d (d varies from -r to +r , a random value ), f being a linear function

For simplicity let us reduce the range of d (taking an integer d such that d varies from 0 to +r , a random value ), taking the function f , where f(x) = 4x+9 , x is also integer lies between 0 and +p let us write the code -

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import numpy as np
import pandas as pd
import random

# Set a random offset
r_d = random.randint(0, 16)

# List to store the (x, y) pairs
data = []

# Generate 100 pairs
for i in range(100):
    r_x = random.randint(0, 256)
    r_y = 4 * r_x + 9 + r_d
    data.append((r_x, r_y))

# Create a DataFrame
df = pd.DataFrame(data, columns=['x', 'y'])

# Display the first few rows
print(df.head())

# we can save the 100 values into a csv file
#df.to_csv("random_data.csv", index=False)
1
2
3
4
5
6
     x    y
0   46  206
1   49  218
2  106  446
3  135  562
4  236  966

Now, let us create a scatter plot then get ready for Linear regression (slightly changed the equation and make d using random function more than once so that more deviation ( scatter value ) can be seen

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
import numpy as np
import pandas as pd
import random
import matplotlib.pyplot as plt




# Create and store data
data = []
for i in range(100):
    r_d = random.randint(0, 316)  # Generate random offset
    r_x = random.randint(0, 456)
    r_y = 4 * r_x + 99 + r_d
    data.append((r_x, r_y))

# Create DataFrame
df = pd.DataFrame(data, columns=['x', 'y'])

# Print sample
print(df.head())

# Save to CSV (optional)
#df.to_csv("random_data.csv", index=False)

# Visualize
plt.figure(figsize=(8, 5))
plt.scatter(df['x'], df['y'], color='blue', alpha=0.6, label='Generated points')
plt.title('Scatter Plot of Generated Data')
plt.xlabel('x')
plt.ylabel('y')
plt.grid(True)
plt.legend()
plt.tight_layout()
plt.show()
1
2
3
4
5
6
     x     y
0    7   272
1  199  1066
2   16   176
3  375  1729
4  259  1254

output_13_1

Next, we will implement Linear Regression on it

This post is licensed under CC BY 4.0 by the author.

Trending Tags