Big Data: Tools and Techniques – Spring 2015
Massachusetts Institute of Technology: Lincoln Laboratory
244 Wood Street, Lexington, MA 02421
Speakers: Vijay N. Gadepally, Jeremy Kepner, Lauren Edwards; MIT Lincoln Laboratory
Dates and Time: Mondays – April 27; May 4, 11, 18 – 6-8PM
By: April 16
After: April 16
Decision: Monday, April 20
Make Checks payable to:
IEEE Boston Section
One Centre Street, Suite 203
Wakefield, MA 01880
The Internet of Things is set to further challenge our ability to collect and analyze big data, which is a growing problem amongst the scientific community. The growing gap between data and users’ ability to process the data calls for innovative tools that address the challenges faced by big data volume, velocity, and variety. This tutorial aims to provide researchers and practitioners with a range of tools and techniques that they can use in closing this gap. This tutorial will focus on building solid fundamentals on the tools and techniques that researchers can use to apply signal processing techniques such as noise removal, spectral fingerprinting, and background modeling to large unstructured data (big data). This tutorial will also focus on showing students how to effectively use popular tools such as MATLAB, Octave, Dynamic Distributed Dimensional Data Model (D4M), Hadoop, SQL, noSQL, and newSQL technologies to solve their big data problems.
The first half of the tutorial will be spent on providing students with a strong understanding of systems engineering, advanced database technologies, schemas to represent unstructured data and introducing students to new advances in big data processing. The second half will concentrate on providing students with hands-on experience of the tools and techniques introduced during the first half using a representative social media dataset and several code examples. The overarching goal of the tutorial is to not only provide theory, but also hands-on experience in working with large data. Special emphasis will be given to rapid prototyping tools and signal processing algorithms.
• Introduction to Systems Engineering
• Introduction to Big Data and Cloud Computing
• Advanced Database Technologies – what is out there
• Key signal processing operations on large, diverse datasets
• Representing big data for signal processing techniques
Case Study: Developing a big data system for social media big data
• Introduction of social media dataset
• Step-by-step design of big data system with emphasis on practical decisions
needed along with hands-on examplesApplying signal processing techniques on Big Data
• Relationship between Signal Processing and Big Data
• Sample signal processing techniques – dimensional analysis and power law analysis
• Hands-on examples using the dynamic distributed dimensional data model (D4M)
toolbox in a MATLAB (or MATLAB-like) environment
The target audience is researchers and practitioners interested in working with large unstructured datasets. Students of all technical backgrounds will be welcome. Only minimal programming experience or background in big data will be required for hands on exercises.
• Learn about the challenges associated with big data;
• Learn about current data trends;
• Gain knowledge of competing technologies and a guide to technology selection;
• Gain hands-on experience developing analytics on large data sets.
Vijay N. Gadepally – Dr. Vijay Gadepally is a Technical Staff Member at the MIT Lincoln Laboratory. Vijay pursues research in the areas of big data, machine learning, high performance computing, and pattern recognition. Vijay has previously worked as a Post-Graduate Intern with Raytheon Company, a visiting scholar with the Rensselaer
Polytechnic Institute and as an Intern with the Indian Institute of Technology, Mumbai. Vijay holds a M.Sc. and PhD in Electrical and Computer Engineering from The Ohio State University. At Ohio State, Vijay’s research focused on the estimation of driver behavior for autonomous vehicle applications and high performance computing. His dissertation in signal processing focused on developing mathematical models to accurately estimate and predict driver behavior to improve the safety of autonomous vehicles operating in a mixed-urban environment. At Ohio State, Vijay held concurrent appointments with the Department of Electrical and Computer Engineering and the Ohio
Supercomputer Center and was recipient of a 2012 Outstanding Graduate Student award. Vijay Gadepally holds a Bachelors of Technology (B.Tech) degree in Electrical Engineering from the Indian Institute of Technology, Kanpur. For further information: http://vijayg.mit.edu/about-vijay
Jeremy Kepner – Dr. Jeremy Kepner is a Senior Technical Staff Member at MIT Lincoln Laboratory and a Research Affiliate the MIT Math Department. Prior to joining MIT, he was a DoE Computational Science Fellow at Princeton University where he received his Ph.D. in astrophysics. Dr. Kepner leads the supercomputing and big data research efforts at MIT Lincoln Laboratory. His team conducts research in a wide range of computing areas and oversees the operation of a variety supercomputers that service hundreds of users at MIT. Throughout his career the focus of Dr. Kepner’s research has been creating and delivering computing systems that require minimal training to operate, thus allowing scientists to be scientists and engineers to be engineers. In addition to two books, both of which are SIAM bestsellers, Dr. Kepner has published over a hundred works in data mining, databases, high performance computing, graph algorithms, cyber security, visualization, cloud computing, random matrix theory, abstract algebra, and bioinformatics. For further information: http://www.mit.edu/~kepner/
Lauren Edwards – Lauren Edwards is an Associate Technical Staff Member at MIT Lincoln Laboratory. Lauren’s interests include big data, machine learning, database technologies, and the application of these to diverse fields. Prior to joining MIT Lincoln Laboratory, Lauren worked as a Product Development Intern at Genscape, where she developed a nearest neighbor model to pinpoint similar past power market days and their corresponding electricity price drivers. Lauren has received a M.S. in Industrial Mathematics from The University of Massachusetts Lowell, focusing in computer science applications such as machine learning and algorithms, and a B.S. in Mathematical Sciences from Worcester Polytechnic Institute, where Lauren explored biological applications of mathematical modeling.
Material to be distributed to participants: Tutorial slides will be distributed to participants. Potentially, a virtual machine with preloaded data and tools or downloadable GNU Octave/MATLAB toolbox and dataset will be provided to students who wish to take part in hands-on exercises.