Credit Card Fraud Detection
The issue is to
spot fraudulent credit card transactions so that credit card firms' consumers
aren't charged for products they didn't buy. This has become a huge issue in
the modern era because all purchases can be made online with just your credit
card information. Credit card fraud detection is critical for any bank or
financial business. Even before two-step verification was employed for online
purchasing in the United States in the 2010s, many American retail website
users were victims of online transaction fraud. When a data breach results in
monetary theft and, as a result, the loss of customers' loyalty as well as the
company's reputation, it puts organizations, consumers, banks, and merchants in
danger. We need to recognise potential fraud so that customers can't be charged
for items they didn't buy. This is one of the best and easiest data science
project ideas for beginners to work on.
In 2017,
unauthorized card operations claimed the lives of 16.7 million people. The goal
is to develop a classifier that can determine whether a proposed transaction is
fraudulent.
The following are the key obstacles
in detecting credit card fraud:
- Every day, massive amounts of data are
gathered, and the model must be fast enough to respond to the scam in time.
- Data that is unbalanced, i.e. the vast
majority of transactions (99.8%) are not fraudulent, making it extremely
difficult to discover the fraudulent ones. Data availability, as the data
is generally private.
- Another big concern is misclassified data,
as not every fraudulent transaction is detected and reported.
- The scammers utilized adaptive approaches against the model.
Overview:
Fraud can be
committed in a variety of ways and in a wide range of industries. We use
machine numpy, scikit learn, and a few more python modules to address the
challenge of recognising credit card fraud transactions in this data science
project. To make a decision, the majority of detection systems combine a
number of fraud detection datasets to create a connected picture of both
legitimate and invalid payment data. We solved the challenge by developing a
binary classifier and experimenting with several data science project
approaches to find which one best matches the problem. If you want to learn
more about these kinds of projects or more about data science, visit our
website, Learnbay: best data science course in Hyderabad which
provides different hands-on project like these.
IP address,
geolocation, device identity, "BIN" data, global latitude/longitude,
history transaction trends, and actual transaction information must all be
considered while making this decision. There are 31 parameters in the dataset.
In practice, this means merchants and issuers use analytically based answers to
detect fraud by using a set of business rules or analytical algorithms to
internal and external data. The PCA transformation resulted in the loss of 28
features due to confidentiality concerns. The only aspects of PCA that were not
changed were "Time" and "Amount."
Credit Card
Fraud Detection with data science is a method that involves a Data Science
team investigating data and developing a model that will uncover and prevent
fraudulent transactions. Fraudsters are always inventing new fraud patterns,
particularly to adapt to fraud detection systems. This is accomplished by
combining all relevant aspects of cardholder transactions, such as the date,
user zone, product category, amount, provider, client's behavioral patterns,
and so on. Data science models that are never updated are insufficient
because they do not account for changes and trends in client spending patterns,
such as throughout holiday seasons and across geographic regions. The data is
then fed into a model that has been gradually taught to look for patterns and
rules in order to determine if a transaction is fraudulent or not. Fraud
monitoring and detection systems are used by all major banks, including Chase.
Importing all the necessary
Libraries
# import the
necessary packages
import numpy as
np
import pandas as
pd
import
matplotlib.pyplot as plt
import seaborn
as sns
from matplotlib
import gridspec
Loading the Data
# copy the path
for the csv file
data =
pd.read_csv("credit.csv")
Code :
Understanding the Data
# Grab a peek at
the data
data.head()
Describing the Data
# Print the
shape of the data
print(data.shape)
print(data.describe())
Imbalance in the data
fraud =
data[data['Class'] == 1]
valid =
data[data['Class'] == 0]
outlierFraction
= len(fraud)/float(len(valid))
print(outlierFraction)
print('Fraud
Cases: {}'.format(len(data[data['Class'] == 1])))
print('Valid
Transactions: {}'.format(len(data[data['Class'] == 0])))
For Fraudulent Transaction, print
the amount data.
print(“Amount
details of the fraudulent transaction”)
fraud.Amount.describe()
For a Normal Transaction, print the
amount details.
print(“details
of valid transaction”)
valid.Amount.describe()
Plotting the Correlation Matrix
# Correlation
matrix
corrmat =
data.corr()
fig =
plt.figure(figsize = (12, 9))
sns.heatmap(corrmat,
vmax = .8, square = True)
plt.show()
Separating the X and the Y values
Dividing the
data into inputs parameters and outputs value format
X =
data.drop(['Class'], axis = 1)
Y =
data["Class"]
print(X.shape)
print(Y.shape)
xData = X.values
yData = Y.values
Skicit Learn is used to create a
Random Forest Model.
from
sklearn.ensemble import RandomForestClassifier
# random forest
model creation
rfc =
RandomForestClassifier()
rfc.fit(xTrain, yTrain)
# predictions
yPred =
rfc.predict(xTest)
Creating a variety of evaluative
parameters
# Evaluating the
classifier
# printing every
score of the classifier
# scoring in
anything
from
sklearn.metrics import classification_report, accuracy_score
from
sklearn.metrics import precision_score, recall_score
from
sklearn.metrics import f1_score, matthews_corrcoef
from
sklearn.metrics import confusion_matrix
n_outliers =
len(fraud)
n_errors =
(yPred != yTest).sum()
print("The
model used is Random Forest classifier")
acc =
accuracy_score(yTest, yPred)
print("The
accuracy is {}".format(acc))
prec =
precision_score(yTest, yPred)
print("The
precision is {}".format(prec))
rec =
recall_score(yTest, yPred)
print("The
recall is {}".format(rec))
f1 =
f1_score(yTest, yPred)
print("The
F1-Score is {}".format(f1))
MCC = matthews_corrcoef(yTest,
yPred)
print("The
Matthews correlation coefficient is{}".format(MCC))
# printing the
confusion matrix
LABELS =
['Normal', 'Fraud']
conf_matrix =
confusion_matrix(yTest, yPred)
plt.figure(figsize
=(12, 12))
sns.heatmap(conf_matrix,
xticklabels = LABELS,
yticklabels = LABELS, annot = True, fmt ="d");
plt.title("Confusion
matrix")
plt.ylabel('True
class')
plt.xlabel('Predicted
class')
plt.show()
Final lines
Fraud is a
serious issue for the entire credit card business, and it is becoming more
prevalent as electronic money transfers become more common. We constructed a
binary classifier using the Random Forest technique to detect credit card fraud
transactions in our python data science project. Credit card issuers
should consider implementing advanced Credit Card Fraud Prevention and Fraud
Detection methods to effectively prevent criminal actions such as the leakage
of bank account information, skimming, counterfeit credit cards, the theft of
billions of dollars annually, and the loss of reputation and customer loyalty.
We learned and
utilized strategies to handle class imbalance issues through this project, and
we obtained a 99 percent accuracy rate. Based on information about each
cardholder's behavior, data science-based methods can continuously
enhance the accuracy of fraud protection. Because some fraudsters conduct
frauds once using online channels and then transition to other ways, fraud
detection systems must detect online transactions using unsupervised learning.
So, hurry and start learning from the Learnbay data science course in Hyderabad as well as start
your exciting project.
Comments
Post a Comment