DroidCollector is an Android malware traffic generation
& collection framework. This framework captures traffic from mobile apps, which are actively installed on each
collection machine. The framework controls traffic collection via a hierarchical management mechanism. Multiple traffic
collection machines work simultaneously. Each collection machine in this frame-work runs multiple threads to collect mobile
network traffic.
We propose an effective and automatic malware detection method using
the text semantics of network traffic. In particular, we consider each HTTP flow generated by mobile apps as a text
document, which can be processed by natural language processing to extract text-level features. Then, we use the text
semantic features of network traffic to develop an effective malware detection model. In an evaluation using 31706 benign
flows and 5258 malicious flows, our method outperforms several existing approaches, and detects 99.15% of malicious flows.
We also conduct an experiment to verify that the method requires only a few samples to achieve a good detection result.
When the detection model is applied to the real environment to detect unknown applications in the wild, experimental
results prove that our method performs significantly better than several popular anti-virus scanners with a detection rate
of 54.81%.
The prevalence of mobile malware has become a growing issue given the tight integration of mobile systems with our daily life. Most malware programs use URLs inside network traffic to forward commands to launch malicious activities. Therefore, the detection of malicious URLs can be essential in deterring such malicious activities. Traditional methods construct blacklists with verified URLs to identify malicious URLs, but their e effectiveness is impaired by unknown malicious URLs. Recently, machine learning-based methods have been proposed for malware detection with improved performance. In this paper, we propose a novel URL detection method based on Floating Centroids Method (FCM), which integrates supervised classification and unsupervised clustering in a coherent manner. The proposed method uses the lexical features of a URL to effectively identify malicious URLs while grouping similar URLs into the same cluster. Our experimental results show that a URL cluster exhibits unique behavioral patterns that can be used for malware detection with high accuracy. Moreover, the proposed behavioral clustering method facilitates the identification of malicious URL categories and unseen malware variants.
University of Jinan