DroidCollector Framework
   DroidCollector is an Android malware traffic generation & collection framework This framework captures traffic from mobile apps, which are actively installed on each collection machine.The framework controls traffic collection via a hierarchical management mechanism. Multiple traffic collection machines work simultaneously. Each collection machine in this frame-work runs multiple threads to collect mobile network traffic.We capture packet-level traces in the first five minutes and finally attain a traffic dataset with 683.4 GB benign traffic data and 1,062.9 GB malicious traffic data.DroidCollector applications dataset contains 150,099 benign applications and 196,760 malicious applications from 797 different malware families contain data from the Drebin project.You can find more details on the malicious application dataset in the paper .
The Architecture Diagram of DroidCollector
Fig. 1: The architecture of DroidCollector
   The traffic collection framework is deployed in the UJN (University of Jinan) campus network. At the gateway of the campus, a firewall and NAT server are present to ensure the safety of the traffic collection framework. As shown in Figure 1, DroidCollector consists of the following three parts: control unit, data storage unit (including traffic storage server and app storage server), and traffic generation & collection unit. The control unit connects with the traffic generation & collection unit and the storage unit via LAN switch. The control unit is responsible for scheduling task. It assigns Android apps from the app storage server to a traffic collection machine in the traffic generation & collection unit. All collection machines in the generation & collection unit work together to complete the traffic collection task. Then, collected traffic data files are transferred to the traffic storage server.
Collection Efficiency for DroidCollector
Fig. 2: Collection efficiency for multithreading on one machine
   we evaluate the time spent on Android traffic collection for multithreading on the settled number of apps. Given that collection machines need to per- form some preparation work before collecting network traffic from apps, the time difference is not notable at the beginning of the collection process. Once the pretreatment is completed, the advantage of multithreading starts to show. According to Figure 3, the collection machine with three threads takes 2,087 minutes to collect the traffic of 800 apps. For the collection machines using six threads, nine threads, and twelve threads, the collection of traffic of 800 apps takes 1,260, 829, and 551 minutes, respectively. Clearly, using multithreading technique can drastically improve the collection efficiency.
You can get more information and download link of the DroidCollector traffic dataset from here.