Meeting 2nd of April

Kamila & Asset

  • NAS is being repaired

TODO

  • automatic mounting iSCSI

Yana

  • Finished GUI

Georgiy

  • Research further parsing of Nvidia network through PCI

Yerkanat

  • Installing OpenStack

Temirlan

  • Hadoop Map reducers scripts

Alexandra

  • Testing hadoop, it is working

Almas

  • Network performance benchmark

PACC (report 1)

Hi,

this is the first report on the work I done for the PACC Project.

I had joint the group in the beginning of February. First few weeks I was busy with the 4th floor lab setup. During the 1st week I had checked the functionality of hardware, installed Ubuntu on lab machines and reported any hardware related issues to the IT.

During the next week I and Almas prepared the Quiz User accounts necessary for Mobile Computing class. The preparation included network setup, installation of required software (such as Eclipse, Android SDK, drivers) and tests associated with the proper functionality of those programs. Most of the time we had spent figuring out the process of folder sharing between users of the same machine and permission related issues. Next, we started to prepare a sample User for cloning, but since there was enough work for one person Almas took responsibility for that.

Then I had created the Google Group for our team for convenience of group announcements, sharing docs and communication between team members. Then I made up the lab schedule. Also, during that week I continued to look for the hardware since some problems were eventually fixed and new machines required Linux installation.

Next we had tried to clone the sample User account onto all machines in the lab. Because of the network and IT-policy related issues the process took a lot time. Some computers were not connected to the network, some were not working and that all made a lot of obstacles for the process.

After that I had started preparation of GUI for the x11vnc program. I had researched the Zenity utility, functionality provided by the x11vnc itself, and the way the Unity sidebar works. After playing with Zenity and Unity I decided to stick with the roll down list of options, appearing on the right click on an icon because it turned out to be the most convenient implementation for users.

Part-time Job at PACC (post 1)

Hello to everyone who might be reading my post now!

So, I started my part-time job at PACC project in the beginning of February.

We firstly started by setting up the computer lab on the 4th floor. We found out that there are several machines that do not work properly or were configured in a wrong way. So we needed to check each computer to make sure everything is working.

Then, we installed Ubuntu on every machine but this time, it was decided to install Ubuntu alongside with Windows. This is just a fresh installation, so there is no authenticating system for students. Instead, they just need to work as Guest or Quiz users (the latter was created on purpose so that students had all the necessary software installed for them; you might read about this more in other students' blogs on this site). We definitely need authentication system so that each student could have his/her own account and directories but this will probably come later. In summer, I had a successful experience of getting LDAP and NFS working together. However the problem with NFS is that this file system is not distributed. User can store his files on the machine he was working on and the files would also be stored on the server. However, if the user wants to use a different machine, on login, all his files on the server will have to be copied to the new machine. Well, the files on the server and clients machines are synchronized but in case the user uses a different machine every time, copying files from the server might slow down the performance.

The solution is to use a distributed file system. I looked for different distributed file systems and what I found is that GlusterFS and HDFS (Hadoop File System) seem to be the most promising. They are also free and open source which is important! Here, let me introduce HDFS. Here it comes...

HDFS consists of connected clusters of nodes. A cluster typically has one NameNode. NameNode manages file system namespace, handles namespace operations, manages access to files by clients and maps data blocks to DataNodes which store these data blocks. In a cluster, there are also several DataNodes that manage storage on the nodes that they run on. The node that a NameNode is ran on is master node. Nodes that DataNodes are ran on are slaves. If you want to get more information about the architecture of HDFS, I recommend you to read hadoop documentation or introduction to hadoop by developerWorks.

Untitled The picture is taken from http://www.ibm.com/developerworks/library/wa-introhdfs/.

To install a hadoop cluster I advise you to follow either installation guide from hadoop documentation or this guide posted by Michael G. Noll

There were several problems I encountered during setup. The one that took me most time was that when trying to install multi-node cluster, DataNodes could not launch. As I figured out later, the problem was in machine's hosts file in "/etc/hosts". The line "127.0.1.1 your_machine's_host_name" came before "actual_network_address your_machine's_host_name". That was the reason why the address was not resolved correctly. So, just put the line with network address before 127.0.1.1 comes; or comment out 127.0.1.1 line since you don't need it. So, though there were even more problems from the problem described, there was a quick and easy fix to this :)

Report I

MapReduce in HDFS (Hadoop File System)

MapReduce - is a framework for distributed computing of tasks using a large number of computers (called "nodes"), forming a cluster. HDFS has its own MapReduce framework.

Lets go deeper, and look at what happens in each step

On the Map step(pretreatment step) the input data is collected. In order to do this, one of the computers (known as the master node - master node) receives the input data of the problem, divide them into parts and transfers to other computers (slave node).

On the Reduce step all preprocessed data is merged. The master node receives responses from the operating units and forms the basis of their results - the solution, which was formulated originally.

Lets look where we can apply it.

For instance

  • city, temperature
  • student, grade
  • car, maximum speed

MapReduce accomplishes this in parallel by dividing the work into independent tasks, spread across many nodes (servers). This model would not scale to large clusters (hundreds or thousands of nodes) if the components shared data arbitrarily. The communication overhead required to keep the data on the nodes synchronized would be inefficient. Rather, the data elements in MapReduce are immutable, meaning that they cannot be updated. Example, if during a MapReduce job, input data is changed eg. (modifying a student grade or car's speed) the change does not get reflected in the input files; instead new output (key, value) pairs are generated which are then forwarded by Hadoop into the next phase of execution(1).

Typically the compute nodes and the storage nodes are the same, that is, the MapReduce framework and the Hadoop Distributed File System  are running on the same set of nodes. This configuration allows the framework to effectively schedule tasks on the nodes where data is already present, resulting in very high aggregate bandwidth across the cluster(2).

here is the typical example of MapReduce

// Function used by slave nodes on the Map step void map(String name, String document): // Input date: // name - name of the document // document - internal data of the document for each word w in document: EmitIntermediate(w, "1"); // Function used by slave nodes on the Reduce step void reduce(String word, Iterator partialCounts): // Input date: // word // partialCounts - grouped list of intermediate results int result = 0; for each v in partialCounts: result += parseInt(v); Emit(AsString(result));

I'm going to write a small example program called "WordCount" as a template for further work for our implementation of HDFS in PACC project.

Reference:

1) http://developer.yahoo.com/hadoop/tutorial/module4.html#basics

2) http://hadoop.apache.org/docs/r0.18.3/mapred_tutorial.html

3) http://architects.dzone.com/articles/how-hadoop-mapreduce-works

Hello Everybody!

Hello everybody!

My name is Alibek Sailanbayev,

Recently I became a member of your team, nice to meet you!

Now I'm working in project about "data retrieval"..

Regards, Alibek Sailanbayev

Seminar Summary

More than 40 researchers, practitioners, and students attended and participated in the Seminar on Academic Cloud Computing held in the School of Science and Technology at Nazarbayev University on November 20, 2013.

Presenting and in attendance were department heads, faculty, researchers, and students from three national universities, including Karaganda State Technical University, S. Seifullin Kazakh Agrotechnical University, Nazarbayev University, and an invited guest speaker from the University of Kassel (Germany).

Also present at the seminar were representatives from governmental agencies and national and multinational companies, including EPAM Systems (USA, KZ), General Communications, Inc. (USA), Hewlett-Packard (US, KZ), JazzSoft (KZ), Kazakhstan Center of Geoinformation Systems, KazCosmos (Kazakhstan’s National Space Agency), KZ-CERT (National Computer Emergency Response Team, KZ), National Information Technologies (KZ), North Caspian Operating Company (KZ), and Samruk Kazyna (National Welfare Fund, KZ).

The seminar attendees were first addressed by Dr. Philippe Frossard, the new Vice-Provost of Research at Nazarbayev University, who provided the opening remarks.

The impetus for the seminar was the dissemination of results achieved and lessons learned from the Private Academic Cloud Computing (PACC) initiative being conducted by Dr. Ulrich Norbisrath, Assistant Professor in Computational Sciences at the School of Science and Technology at Nazarbyev University. The seminar featured a keynote address by Dr. Norbisrath, the principal investigator of the PACC project. Also presenting at the seminar were five student interns who were instrumental in the development and organization of the PACC project during the 2013 summer.

Also presenting at the seminar was an invited guest speaker, Dr. Albert Zündorf, Professor for Software Engineering at the University of Kassel, Germany. Dr. Zündorf’s presentation, “Software Engineering the Cloud”, reported on the experience gained from using cloud computation for training neuronal networks for wind turbine weather forecast prediction. The presentation covered the cloud infrastructure and lessons learned from its creation.

Other highlights from the seminar included the following short presentations: a discussion on “G-Cloud and R&D at NITEC” given by Altynbek Kalitanov from National Information Technologies; Nazarbayev University students Magzhan Ikram and Georgiy Krylov and their talk on “GPU Computing – Using the Power of Graphic Cards for High Performance Computing Resources”; and Nurzhan Bakibayev, also from Nazarbayev University, and his presentation on “Using Amazon Web Services in Research & Teaching – Two Case Studies”.

The seminar culminated with small group discussions addressing the following topics: GPU computing and computer applications in health care; and academic cloud computing student projects that involve industry and university internships and opportunities.

The seminar was characterized by lively discussion among the participants. There were numerous opportunities for networking, exchanging ideas, and collaborative development. General feedback from those who attended the seminar was strongly positive and many agreed that there was great merit in convening future seminars to further develop partnerships and establish potential collaboration as it pertains to the PACC project, especially in the areas of research and teaching.

Funding for the seminar was provided in whole by the PACC project’s grant award made available by the Nazarbayev University Research and Innovation System (NURIS). Additional funding to cover the travel expenses of the internationally invited guest speaker, Dr. Albert Zündorf, were made possible by the German Academic Exchange Service (DAAD).

Final Agenda for the Seminar on Academic Cloud Computing

Date:  20 November 2013 Time:  9:00 AM – 16:30 PM Location:  Nazarbayev University, Astana, Kazakhstan Venue:  School of Science & Technology, Block 7, 7210

The seminar gathered researchers, practitioners, and students in the area of cloud computing in an academic setting. The seminar showcased presentations on different research projects, a keynote speech by the Primary Investigator of the Private Academic Cloud Computing project and a keynote on Software Engineering the Cloud by an international invited guest speaker. The focus of the seminar was on the benefits of cloud computing in the areas of academic research, teaching, and administrative operations.

****Wednesday, 20 November** ****** **Speaker**
**9:00 – 10:00** Registration and Reception
**10:00 – 10:15** Welcome/Opening Remarks Vice-Provost of Research Dr. Philippe Frossard, Nazarbayev University
**10:15 – 11:00** First Keynote: “_Private Academic Cloud Computing – An Experience Report on Using Existing Infrastructure for Cloud Resources at Nazarbayev University”_ Prof. Dr. Ulrich Norbisrath, Nazarbayev University
**11:00 – 11:15** Refreshment Break
**11:15 – 11:35** Introduction to Talks: _“Research in Teaching”_ Asst. Prof. Dr. Timothy Shipley, Nazarbayev University
**11:35 – 12:15** Talk #1:_ “Cloud Internship Report”_ Temirlan Atambayev, Asset Ismagambetov, Alexandra Kim, Kamila Kinayat, and Yerkanat Ramazanov, Nazarbayev University
**12:15 – 13:15** Lunch Break
**13:15 – 13:50** Second Keynote: “_Software Engineering the Cloud – Developing Software for a Clean Approach to Using Cost-Effective Cloud Resources for Wind Turbine Forecast Prediction”_ Prof. Dr. Albert Zündorf, University of Kassel
**13:50 – 14:10** Talk #2: _“G-Cloud and R&D at NITEC”_ Altynbek Kalitanov, National Information Technologies
**14:10 – 14:30****** Talk #3: “_GPU Computing – Using the Power of Graphic Cards for High Performance Computing Resources”_ Magzhan Ikram & Georgiy Krylov, Nazarbayev University
**14:30 – 14:50** Talk #4: _“Using Amazon Web Services in Research & Teaching ­– Two Case Studies”___ Nurzhan Bakibayev, Nazarbayev University
**14:50 – 15:05** Refreshment Break
**15:05 – 16:20** Discussion Tables (various topics) Discussion Group Moderators
**16:20 – 16:30** Closing Remarks
**16:30** Adjourn

Discussion Table Topics (& Moderators)

The seminar culminated with small group discussions addressing the following topics:

  1. GPU Computing and Computer Applications in Health Care (Magzhan Ikram)
  2. Academic Cloud Computing Student Projects: Industry & University Internships and Opportunities (Laura Paluanova)
  3. Localizing Cloud Services to Run Your Own Dropbox, Email, VOIP System, or Social Network (Ulrich Norbisrath)

Keynote Speaker Biographies

Dr. Ulrich Norbisrath is an Assistant Professor of Computer Science at Nazarbayev University’s School of Science & Technology. His research is in the areas of Ubiquitous Computing and Information Management. He works on ad-hoc networks and spontaneous private clouds as well as a biometric technology broker framework. His Friend-to-Friend (F2F) Computing concept allows for the ad-hoc establishment of scientific- or service-based networks or private clouds. He has developed methods and tools to support migrating legacy desktop and server environments to the cloud and calculating their respective costs. Professor Norbisrath teaches courses on agile software development, graph transformation, systems, and mobile computing.

Dr. Albert Zündorf studied Computer Science at RWTH Aachen University, Germany from 1984 to 1990. In 1994, he completed his PhD at RWTH Aachen. He then spent six years at the University of Paderborn doing his Habilitation. After two years as a step-in Professor at the University of Braunschweig he became a full professor for software engineering at the University of Kassel in 2002. Albert Zündorf is the initiator and one of the technical leaders of the Fujaba CASE tool project as well as its successor, SDMLib.