The use of machine learning for personal privacy and security

Posted by on Nov 24, 2017 in Uncategorized
No Comments

Abstract

Privacy is a fundamental concept to being a human being. We are social animals we have the need to connect with other human beings to share ideas and let others know about our ideologies and beliefs. As technological advancements bring about rapid changes and improve our way of life it also grows in complexity and introduce new risks and challenges. As more and more aspects of our lives are being automated users are tasked with consequential and tough privacy and security decisions. Malware is a worldwide epidemic. [1] Studies suggest that malware is the biggest threat to online security and privacy.

This report sheds light on privacy and security challenges in the age of information, human behavior towards privacy, highlight shortcomings of current solutions and potential benefits of using artificial intelligence and machine learning as a personal security assistant to protect against cyber-attacks.

 

 

Introduction

 

The Internet is one of the fastest-growing areas of technical infrastructure development. [2] It is growing in size and complexity. New technologies are introduced at a very fast pace forcing everybody to upgrade. The introduction of new technologies introduces new attack surfaces and new privacy and security challenges.

As people connect and go about their online activities they are faced with a number of difficult privacy and security decisions. Those decisions range from weather to install a smartphone app which requires certain permissions to set permissions on social network’s activity or whether to click on an email link or not. Through our online activities, we knowingly/unknowingly disclose information about our beliefs, traits, interests, and intentions to commercial entities/governments and to people who should not have this information. Individuals are daily confronted with complex privacy and security decisions and it is often extremely difficult to weight all the factors involved or to exactly know what vulnerabilities they are exposing their selves to if they interact with a system or how much and what kind of data is being collected and in what ways that data can be used. Decisions regarding privacy or system security are often very complex. Security and privacy are rarely end-users’ primary tasks, and users have limited mental resources to evaluate all possible options and consequences of their actions. [3]

Malware is a major threat on the internet today, millions of hosts on the internet are infected today and thousands are daily getting infected. Unfortunately, the increasing quantity and diversity of malware have defeated classic security techniques like anti-virus and anti-spyware. [4]

This report makes the following major contributions.

  • Identify the complexity of online privacy decision making and human behavior towards privacy and security
  • Identify major cyber threats
  • Propose the use of Artificial intelligence and machine learning to solve the problems of malware and privacy issues in the online world.
  • Proposed an artificial intelligent client-server architecture that would examine and predict the behavior of visiting a web link.

 

Related work

 

The internet has become a significant part of our lives, more and more parts of our lives are being automated every day and technology is evolving at an extremely fast rate.

Due to the rapid change in IT infrastructure and technologies the security/privacy research tend to become outdated quite fast, Some problems get solved but new ones get generated as a consequence or vulnerabilities are found in newly introduced technologies. [5] A growing body of research has investigated individuals’ choices in the presence of privacy and information security trade-offs, the decision-making hurdles affecting those choices, and ways to mitigate those hurdles. [6] [7][10]  But the problem of online privacy and security remains unsolved as the word privacy itself remain ambiguous, privacy means different things to different people. Different people have different views about the disclosure of information and the collection and storage of information by mega-corporations and governments.

Although the problem of privacy from an online transaction based perspective remains unsolved there’s a lot of research being done to strengthen system security and to mitigate the risk of cyber-attacks. Most of the research is based on malware detection. As malware is a major threat on the internet today, millions of hosts on the internet are infected and thousands are daily getting infected. [8][9] Malware is everywhere in different forms for different incentives, malware exists on servers, personal computers and mobile devices like phones and tablets. The problem of malware persists despite the existence of many forms of protection technologies like anti-virus, cryptography, and hashing. Although these technologies decrease the threat significantly but still malware remains a widespread epidemic. Malware detection is performed either through anomaly-based detection or signature-based detection. Malware detection through signature-based algorithms is becoming difficult as modern malware is designed in multiple layers and most has a self-update functionality which is extremely difficult to detect, [11] Artificial intelligence solutions are proposed by the research community which is based on behaviour analysis, anomaly detection, resource usage and power usage but each one has its own drawbacks and personal computers has limited memory and processing resources. [12][13] Machine learning has been recognized as a promising technique but working is mostly a black box where little is known and understood about the results produced often with high accuracy. Due to this reason and the lack of precision and recall these techniques cannot be used in place of traditional AV.

Information technology and computer science is an ever-evolving field. Most of the research is focused on problems like malware detection, web security, intrusion detection whereas most of the literature about incidents if reviewed critically points to the human problem.

There are technologies to protect us online although not completely but following best security practices, being up to date with the security industry can significantly reduce the risk of being a victim of cyber-attack. Even if we could produce the best antivirus software and have weak passwords we would still be vulnerable.

Our solution focus on the human problem rather than focusing on the technology, we are proposing an artificially intelligent system that would automate our online transactions and would follow best security practices without human intervention.

Information security and human behavior

 

Individuals manage their public and private spaces in various ways, by being reserved, distinctiveness and anonymity but also by deception and disguise. People establish these boundaries for many reasons like protection against social influence and control and for the need of intimacy. The advancement in technology has made the data collection and use of personal data of individuals almost disappeared as a result people often don’t know what kind of data is being collected about them and what information corporation governments have about them how that data can be used and what are the consequences. Because people are not aware of the collected data and the ways that data can be used they are often uncertain about how much information to share.

 

 

 

According to Australian Cybersecurity Center in their threat report in 2016 “Cybercrime remains a pervasive threat worldwide and to Australia’s national interests and prosperity. Australia’s relative wealth and high use of technology such as social media, online banking, and government services make it an attractive target for serious and organized criminal syndicates”.[15] Cybercrimes rise as more and more aspects of our lives are becoming online. The above infographics by ISACA summarise the horrors of the cyber threat.  In 2014 MIT conducted a digital currency experiment to discover customer behavior towards commercial and government surveillance and explained the privacy paradox that people say they care deeply about privacy but practically they give away their private data very easily when small incentives are involved. [17][22]

A growing body of research has investigated individuals’ choices in the presence of privacy and information security found that Security and privacy are rarely end-users’ primary tasks, and users have limited mental resources to evaluate all possible options and consequences of their actions.[9][14]

 

 

 

A client/server artificial intelligent system

 

When we study online privacy, security and human behavior, we find that preventive measures and technology exists but the reason for most of the successful cyber-attacks is poor human choices. The best antivirus software wouldn’t save us if our online banking password is “password”. Most of the research community is focused on solving the problem of malware detection, malware behavior, and anti-spyware, whereas there’s very little focus on the human factor.

Due to the complexity of information systems, the privacy decisions have become extremely complex and humans have limited knowledge and mental resources to weight all possible consequences in an online transaction. We are in need of technology which assists human choice in the online world.

Through the literature review on the subject, we found that most of the malware is propagated through web links and email attachments which are also typically in the form of web links. To tackle this problem I am proposing an artificial intelligent client-server architecture that would examine and predict the behavior of visiting a web link.

 

 

 

 

 

 

 

 

Client/Server Architecture

 

The proposed system will scan the behavior of web links and warn the user before navigating to any web link, it will also stop unwanted ads/popups which are also a reason for the propagation of malware. The links along with their behavior will be saved on the server. When one client visit a page and it’s processed and tagged as malicious or safe, all the other users will benefit from that processing. We’ll record other data about the link, like the server address, location of the server, number of hits and other information about the website like the type of content it serves and about its advertisement behavior like popups and origin of an advertiser.

We hope to scan the web for malicious activity, we would get valuable insight from the data and the neural net would help isolate the malicious part of the web and would provide useful predictions for new links.

 

 

 

Figure 2: Central server collect links from clients and process it for malicious behavior

 

 

 

Client

 

The client agent of the proposed solution is responsible for Scanning links in all incoming emails and send all the links extracted from email and online activity, send extracted links to the web server.

Another function of the client agent is to monitor resource usage activity.

Following are screenshots different types of resource monitoring activities.

Figure 3 CPU Usage

 

CPU usage provides useful insight about how the central processing unit is being used when different apps are running, we can monitor overall activity and usage by different individual processes. This data can be used to identify usage patterns of malicious software.

 

 

 

Figure 4 RAM Primary memory) Usage

 

 

Ram activity will be used to identify different RAM intensive application and combined with CPU usage will help in identifying malicious code running on the system.

We can also isolate memory usage by the operating system and utility software and can monitor the memory usage of any given process or threat.

 

 

 

Figure 5 Network Usage

 

 

Figure 6 secondary memory, hard drive read/write usage

Network and disk read/write data will be monitored and will be used as training data for the AI classifier, once the classifier has enough training data it will detect malicious activity through the change in power usage, memory and network usage. Known malicious samples can be identified through resource monitoring even if names or signatures are being changed.

Resource monitoring is a defensive measure that can identify based on known family of malicious software and would be helpful in predicting and identifying unknown families of malicious software.

 

Web Server

 

The webserver has two main responsibilities.

  • Processing and storing data provided by the clients.
  • Using sandbox in a virtual machine to monitor the behaviour of visiting web links.

 

 

Data Flow

The clients has basic patterns of a non-malicious resource usage patterns, when a client detect a different resource behaviour it send the data to the server. Clients send every link visited and all email links.

The links are then processed by the server in a virtual environment. Various aspects of the links are monitored like the data server requests, cookies, sessions info, data being downloaded, behaviour and type of the data and information about the server like IP address.

Data Processing

Two AI classifiers are trained to monitor behaviour of links and predict malicious links and for malicious pattern recognition.

Artificial intelligence algorithms and different types of artificial intelligent solutions are discussed by Selma Dilek , Hüseyin Çakır and Mustafa Aydın in [18].

The two classifiers after training will be able to provide useful insight to users before clicking on a web links and block links that are malicious. It would also help in the detection of malware propagated from other sources by the use of resource monitoring.

 

 

 

 

 

Evaluation

 

A growing body of research has investigated individuals’ choices in the presence of privacy and information security found that Security and privacy are rarely end-users’ primary tasks, and users have limited mental resources to evaluate all possible options and consequences of their actions. Humans are the weakest link in the security chain.

Our literature review concluded that the research community is focused on malware detection and mostly discussing different artificial intelligent algorithms and their uses.[19][20][21] There’s a fair bit of research on privacy issues that arises with different aspects of our online behaviour like search history, online shopping and the use of social media.  There’s very little research on human behaviour towards privacy, the problems complexity and technical capability of internet users.

Our solution address this gap by introducing a framework that would help users by alerting users about maliciousness of a website/web link. Would also detect malicious resource usage and alert the user.

The lack of time did now allow the development of a prototype but the research discussed in [18] by Selma Dilek , Hüseyin Çakır and Mustafa Aydın provides the detailed analysis of Artificial intelligence algorithms and their effectiveness.

Our solution collects data from every device that use our client agent which means the collection of huge amounts of training data for the artificial intelligent classifier which greatly improve the effectiveness of AI algorithms.

A solution like ours which aids human behaviour online does not exist, it’s a gap that I addressed in this report and I believe that it will significantly improve security and reduce the risk of cyber-attack.

 

 

 

 

 

 

 

 

Conclusion

 

Internet is the fastest developing technical infrastructure, the Internet has greatly improve our way of life but its growth and complexity has introduced new problems. Cyber-attacks, Disruption of services, theft of intellectual property, identity theft and privacy violation are some of the biggest problems in the online space.

Due to the rapid change in IT infrastructure and technologies the security/privacy research tend to become outdated quite fast, and the introduction of new technologies introduce new problems, although preventive measures and technology exists but the reason for most of successful cyber-attacks is poor human choices. A normal internet user doesn’t have the technical knowledge and capability to evaluate the consequences that arises from an online transaction.

Our solution focus on the human problem with a focus to assist human choice online, our framework provides useful insight on web links and warn user before visiting a malicious website.

In future we would like to derive useful insight from the resource usage data to build better and efficient software, the data can also be used to make efficient processing units and memory.

I believe that assisting human choice online and making it easier for humans to make informed decisions about their online activity would greatly improve cyber security and reduce the risk of cyber-attacks.

 

References

 

  1. Horowitz and D. Lucero, “SYSTEM-AWARE CYBER SECURITY: A SYSTEMS ENGINEERING APPROACH FOR ENHANCING CYBER SECURITY”,INSIGHT, vol. 19, no. 2, pp. 39-42, 2016.
  2. Chen, C. Beaudoin and T. Hong, “Securing online privacy: An empirical test on Internet scam victimization, online privacy concerns, and privacy protection behaviors”,Computers in Human Behavior, vol. 70, pp. 291-302, 2017.
  3. Mangialardo and J. Duarte, “Integrating Static and Dynamic Malware Analysis Using Machine Learning”,IEEE Latin America Transactions, vol. 13, no. 9, pp. 3080-3087, 2015.
  4. Whalen, “This Time, It’s Personal: Recent Discussions on Concepts of Personal Information”,IEEE Security & Privacy Magazine, vol. 10, no. 1, pp. 77-79, 2012.
  5. Chen, C. Beaudoin and T. Hong, “Securing online privacy: An empirical test on Internet scam victimization, online privacy concerns, and privacy protection behaviors”,Computers in Human Behavior, vol. 70, pp. 291-302, 2017.
  6. Rieck, P. Trinius, C. Willems and T. Holz, “Automatic analysis of malware behavior using machine learning”,Journal of Computer Security, vol. 19, no. 4, pp. 639-668, 2011.
  7. A.Saeed, A. Selamat and A. M. A. Abuagoub, “A Survey on Malware and Malware Detection Systems”,International Journal of Computer Applications, vol. 67, no. 16, pp. 25-31, 2013.
  8. Ismail, M. Marsono, B. Khammas and S. Nor, “Incorporating known malware signatures to classify new malware variants in network traffic”,International Journal of Network Management, vol. 25, no. 6, pp. 471-489, 2015.
  9. Acquisti, A., Adjerid, I., Balebako, R., Brandimarte, L., Cranor, L. F., Komanduri, S., Leon, P. G., Sadeh, N., Schaub, F., Sleeper, M., Wang, Y., Wilson, S. 2016. Nudges for Privacy and Security: Understanding and Assisting Users’ Choices Online. 1, 1, Article 1 , September 2016
  10. YIN, H., SONG, D., EGELE, M., KRUEGEL, C., AND KIRDA,E. Panorama: Capturing System-wide Information Flow for Malware Detection and Analysis. In ACM Conference on Computer and Communication Security (CCS)
  11. Saeed, Imtithal, Ali Selamat, and Ali M. A. Abuagoub. “A Survey On Malware And Malware Detection Systems”. International Journal of Computer Applications 67.16 (2013): 25-31. Web.
  12. Sobh, “Hybrid Swarm Intelligence and Artificial Neural Network for Mitigating Malware Effects”, Recent Patents on Computer Science, vol. 7, no. 1, pp. 38-53, 2014.
  13. Bidoki, S. Jalili and A. Tajoddin, “PbMMD: A novel policy based multi-process malware detection”, Engineering Applications of Artificial Intelligence, vol. 60, pp. 57-70, 2017.
  14. “Geek Speak: Secure IT |THWACK”, solarwinds.com, 2017. [Online]. Available: https://thwack.solarwinds.com/community/solarwinds-community/geek-speak_tht/blog/2016/02/05/virtualization-beyond-secure-your-virtual-environments. [Accessed: 01- Feb- 2017].
  15. “Cybersecurity Nexus | Cyber Security Certifications | Cyber Security Education – CSX”, Cybersecurity.isaca.org, 2017. [Online]. Available: https://cybersecurity.isaca.org/. [Accessed: 01- Feb- 2017].
  16. Australian Cyber security Center, “THREAT REPORT”, 2016. Available: https://www.acsc.gov.au/publications/ACSC_Threat_Report_2016.pdf. [Accessed: 23- Dec- 2016].
  17. Athey, C Catilini, C Tucker, “Escaping from Government and Corporate Surveillance. Evidence from the MIT Digital Currency Experiment”, Oct 2016
  18. Selma Dilek , Hüseyin Çakır and Mustafa Aydın,” APPLICATIONS OF ARTIFICIAL INTELLIGENCE TECHNIQUES TO COMBATING CYBER CRIMES: A REVIEW”, International Journal of Artificial Intelligence & Applications (IJAIA), Vol. 6, No. 1, January 2015
  19. Barani, (2014) “A hybrid approach for dynamic intrusion detection in ad hoc networks using genetic algorithm and artificial immune system,” Iranian Conference on Intelligent Systems (ICIS), pp.1 6.
  20. Jiang, M. Frater, J. Hu, (2011) “A Bio-inspired Host-based Multi-engine Detection System with Sequential Pattern Recognition”, Ninth IEEE International Conference on Dependable, Autonomic and Secure Computing, pp. 145 – 150.
  21. S. A. Ansari, M. Inamullah, (2011) “Misbehavior detection in mobile ad hoc networks using Artificial Immune System approach”, IEEE 5th International Conference on Advanced Networks and Telecommunication Systems (ANTS), pp. 1 – 6.
  22. Fang, N. Koceja, J. Zhan, G. Dozier, D. Dipankar, (2012) “An Artificial Immune System for Phishing Detection”, IEEE World Congress on Computational Intelligence (WCCI 2012), pp.1 7

F

Reply