Here we list a whole set of topics that you could consider when considering Data.
General Training Material on Data
Hadoop Description: The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.
Hadoop Training Material: To view the training material pls click on the links below:
- A Brief Overview of Hadoop: A Brief Overview of Hadoop
- Access to IPB Hadoop Cluster: Access to IPB HadoopCluster
- Hadoop MapReduce: Hadoop MapReduce
- Hadoop MapReduce Streaming: Hadoop MapReduce Streaming
- Twitter user ranking: Twitter user ranking
- Hands on: Vi-SEEM Introduction to Hadoop: Hands on: Vi-SEEM Introduction to Hadoop
DSpace training material
DSpace is the software of choice for academic, non-profit, and commercial organizations building open digital repositories. It is free and easy to install "out of the box" and completely customizable to fit the needs of any organization. DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets. DSpace has an active community of developers and is used by thousands of institutions worldwide.