Designed & developed applications for Hadoop & AWS platforms which saved 100 hours of human work. Developed Spark app to reliably process 10TB data daily. Built Master-slave architecture on AWS using SQS, auto-scaling, EC2, S3. Performed Unit & Integration testing for project. Used Kafka queues to publish results. Collaborated with infrastructure teams of Hadoop and Cloud for rapid development.
● Feed Efficacy : Build Django, Bootstrap based website to visualize feed efficacy with graph & charts. Developed an alert system to notify anomaly in feed to stakeholders. Build automation to source relative data from MS-SQL, PostgreSQL, Hive and store in a uniform structure for faster response to web pages. Build an oozie app using SQOOP to push data from Hadoop cluster to local DB.
● Folder Path Clustering : Research and data analysis for identifying possibly malicious folders based on files inside folder. Hive data compression with bucketing, paraquet/orc file format in Hadoop. Performed debugging & testing of python, java & Hadoop(spark, hive, oozie) applications. Scheduled Oozie job to run Hadoop applications. Performed additional role of scrum master for Agile team.
Built a honeypot system to allure hackers and malware spammer to source the latest threats and malwares to save customers from zero-day attack. Used Lamp stack, Bootstrap to develop the website and python automation to process and analyze collected malwares and threats.