Parallel Protein Classification with IBM BigInsights

Büchi, Christof and Mathys, Susanne (2013) Parallel Protein Classification with IBM BigInsights. Student Research Project thesis, HSR Hochschule für Technik Rapperswil.

[thumbnail of semesterThesisBuechiMathys.pdf]
Preview
Text
semesterThesisBuechiMathys.pdf - Supplemental Material

Download (1MB) | Preview

Abstract

Big Data is an expanding topic in information technology based on the huge collection of data which is available today on IT systems all over the world. Processing huge amounts of large files and analyzing unstructured data in real time could bring advantages for institutions or enterprise which store a large volume of generated data from their transactions.
Dealing with the rapid growth of data and analyzing it is crossing the boundaries of the given IT infrastructures. Google and Yahoo! have introduced their own way how to handle such datasets. A completely new architecture beyond well-known established tools and principles is required to store massive data efficiently in storage and process them with minimal overhead.
Big Data systems and frameworks such as IBM BigInsights with Hadoop provide a distributed faulttolerant file system running on commodity hardware. They also allow writing custom applications in Java based on the MapReduce principle.
How difficult would it be to perform classification with a given single processing application on a Big Data system? During our research we wanted to show that it is as simple as setting up a cluster and running the tool out of a bash script that is used within a Hadoop streaming job. We took a look at the overhead of using such a complex framework for processing simple applications in a parallel manner. We also had a scope to the scale out characteristics of the cluster size.

Item Type: Thesis (Student Research Project)
Subjects: Area of Application > Industry
Area of Application > Data Mining
Area of Application > Healthcare, Medical Sector
Metatags > ITA (Institute for Internet Technologies and Applications)
Divisions: Bachelor of Science FHO in Informatik > Student Research Project
Depositing User: HSR Deposit User
Contributors:
Contribution
Name
Email
Thesis advisor
Joller, Josef
UNSPECIFIED
Thesis advisor
Kienzler, Romeo
UNSPECIFIED
Date Deposited: 23 Jul 2013 09:20
Last Modified: 23 Jul 2013 09:20
URI: https://eprints.ost.ch/id/eprint/291

Actions (login required)

View Item
View Item