Confusion matrix and Intrusion Detection System
CONFUSION MATRIX
The confusion matrix is a matrix used to determine the performance of the classification models for a given set of test data. It can only be determined if the true values for test data are known. The matrix itself can be easily understood, but the related terminologies may be confusing.
It looks like the below table:
The above table has the following cases:
— True Positive (TP)
- The predicted value matches the actual value.
- The actual value was positive and the model predicted a positive value.
— True Negative (TN)
- The predicted value matches the actual value.
- The actual value was negative and the model predicted a negative value.
— False Positive (FP) — Type 1 error
- The predicted value was falsely predicted.
- The actual value was negative but the model predicted a positive value.
- Also known as the Type 1 error.
— False Negative (FN) — Type 2 error
- The predicted value was falsely predicted.
- The actual value was positive but the model predicted a negative value.
- Also known as the Type 2 error.
ACCURACY IN CONFUSION MATRIX
Accuracy is a measure for how many correct predictions your model made for the complete test dataset. It is measured by the following formula:
Accuracy = ( TP + TN ) / ( TP + TN + FP + FN )
CYBER CRIME
Cyber crime is any criminal activity that involves a computer, networked device or a network. While most cyber crimes are carried out in order to generate profit for the cyber criminals, some cyber crimes are carried out against computers or devices directly to damage or disable them, while others use computers or networks to spread malware, illegal information, images or other materials. Some cyber crimes do both — i.e., target computers to infect them with a computer virus, which is then spread to other machines and, sometimes, entire networks.
INTRUSION DETECTION SYSTEM ( IDS )
An Intrusion Detection System (IDSs) is a system that monitors network traffic for suspicious activity and issues alerts when such activity is discovered. It is a software application that scans a network or a system for harmful activity or policy breaching. Any malicious venture or violation is normally reported either to an administrator or collected centrally using a security information and event management (SIEM) system. A SIEM system integrates outputs from multiple sources and uses alarm filtering techniques to differentiate malicious activity from false alarms. Intrusion identification systems (IDSs) are commonly divided into two types: Signature and Anomaly intrusion detection systems .Signature-based intrusion identification rely on comparison with signatures of recognized attacks which are stored in a database ,but it cannot detect unknown attacks. However, Anomaly-based IDSs use statistical approach to detect activities that deviates from usual limit of resource use and typical conduct parameters. False positives and false negatives rate remain high in case of anomaly-based identification.
Confusion matrices are used to represent the data associated to predicted and actual classification done by classifiers:
— True-Positive (TP): Correctly classify an anomalous sample as attack.
— True-Negative (TN): Correctly classify a non-attack sample as ordinary instance.
— False-Positive (FP): Incorrectly classify an ordinary sample as anomalous instance.
— False-Negative (FN): Incorrectly classify an attack sample as ordinary instance.
Reduction of False negatives and false positives is a major research problem as these have very negative effects on overall security of networks.
CONCLUSION
In present scenario, intrusion detection remains critical for network security and machine learning based applications which have given a major boost in finding novel attacks. Application of Multiple classifiers i.e. hybrid systems and ensemble learning methods in recent years have given a major boost in increasing the accuracy of attack detection techniques. But the rate of false positives and false negatives still needs to be addressed.