Wanted: Data scientists. No math chops? No problem.

The phenomenal growth in demand for Big Data talent is apparently set to continue. A recent survey of Fortune 500 companies, by consultants New Vantage Partners, found that 85% have either launched Big Data projects or are planning to do so, and that their spending on data analysis will jump by an average of 36% over the next several years. No wonder, then, that Harvard Business Review, in an article last October, called data analytics “the sexiest job of the 21st century.”

via Wanted: Data scientists. No math chops? No problem. – Ask Annie -Fortune Management.


Disconnected: My year without the Internet

We are using the Internet wrong. Smartphones turn people into horrible listeners. And cat videos aren’t as riveting as we think they are.

These are just some of the revelations writer Paul Miller had during a year of self-imposed exile from the Internet.

Miller came back online May 1 after giving up the Internet for a year and documenting his experiences for tech site The Verge. After a nerve-wracking start (including finding 22,000 e-mails in his inbox), Miller is settling comfortably back into the Web’s black hole of information and nonstop chatter.

We talked to Miller about what he learned on the other side, what’s changed online in the past year, and how his dream of being a cyborg won’t involve Google Glass.

via Disconnected: My year without the Internet – CNN.com.


Exploring Windows Azure Drives, Disks, and Images

With the preview of Windows Azure Virtual Machines, we have two new special types of blobs stored in Windows Azure Storage: Windows Azure Virtual Machine Disks and Window Azure Virtual Machine Images. And of course we also have the existing preview of Windows Azure Drives. In the rest of this post, we will refer to these as storage, disks, images, and drives. This post explores what drives, disks, and images are and how they interact with storage.

via Exploring Windows Azure Drives, Disks, and Images – Windows Azure Storage Team Blog – Site Home – MSDN Blogs.


Designing Great Cloud Applications

I get strange looks when I talk to developers about the difference between developing an application to a product versus developing an application to a service.  The application you write on premise is written to a piece of software purchased, installed and configured on a piece of computer hardware that you privately own.  The application you write in the cloud is written to a set of services that are available to you as well as the public to exploit.  So let’s explore how they are different.

via Designing Great Cloud Applications – Windows Azure – Site Home – MSDN Blogs.


Data, Data, Data: Thousands of Public Data Sources

We love data, big and small and we are always on the lookout for interesting datasets. Over the last two years, the BigML team has compiled a long list of sources of data that anyone can use. It’s a great list for browsing, importing into our platform, creating new models and just exploring what can be done with different sets of data.

via Data, Data, Data: Thousands of Public Data Sources | The Official Blog of BigML.com.


8 Essential Best Practices in Windows Azure Blob Storage

Binary Large OBject (BLOB) storage is the usual way of storing file-based information in Azure. Blobs are charged according to outbound traffic, storage space and the operations performed on storage contents. This means that the ways that you  manage Blob Storage will affect both cost and availability.

The Windows Azure platform has been growing rapidly, both in terms of functionality and number of active users. Key to this growth is Windows Azure Storage, which allows users to store several different types of data for a very low cost. However, this is not the only benefit as it also provides a means to auto scale data to deliver seamless availability with minimal effort.

via 8 Essential Best Practices in Windows Azure Blob Storage.


How to use blob storage – Windows Azure feature guide

This guide will demonstrate how to perform common scenarios using the Windows Azure Blob storage service. The samples are written in C# and use the Windows Azure Storage Client Library for .NET (Version 2.0). The scenarios covered include uploading, listing, downloading, and deleting blobs. For more information on blobs, see the Next steps section.

via How to use blob storage – Windows Azure feature guide.


Introduction to R, a video series by Google

Google released a 21-part short video series that introduces R. Most of the videos are about two minutes, with none of them going over six, and each one is a on focused task or concept. So this could be a good way to start. Just open R, start a video, and follow along.

via Introduction to R, a video series by Google.


The Apache Hadoop UI – Tutorials and Examples for Hadoop, HBase, Hive, Impala, Oozie, Pig, Sqoop and Solr — Hadoop tutorial: how to access Hive in Pig with HCatalog in Hue

What is HCatalog?

Apache HCatalog is a project enabling non-Hive scripts to access Hive tables. You can then directly load tables with Pig or MapReduce without having to worry about re-defining the input schemas, caring about the data location or duplicating it.

Hue comes with an application for accessing the Hive metastore within your browser: Metastore Browser. Databases and tables can be navigated through and created or deleted with some wizards.

The wizards were demonstrated in the previous tutorial about how to Analyse Yelp data. Hue uses HiveServer2 for accessing the Hive Metastore instead of HCatalog. This is because HiveServer2 is the new secure and multi concurrent server for Hive and it already includes a fast Hive Metastore API.

HCatalog connectors are however useful for accessing Hive data from Pig. Here is a demo about accessing the Hive example tables from the Pig Editor.

via Hue – Hadoop User Experience – The Apache Hadoop UI – Tutorials and Examples for Hadoop, HBase, Hive, Impala, Oozie, Pig, Sqoop and Solr — Hadoop tutorial: how to access Hive in Pig with HCatalog in Hue.


Running a TPC-C workload on SQL Server

When you want to simulate a TPC-C based workload, you have to do 2 different things:

  • Creating the necessary database with the initial data
  • Run the TPC-C against the created database

Let’s have a more detailed look on both of these steps. Before you can create the actual database, you have to tell the tool with which database system you are working. Hammerora supports the following database systems:

  • Oracle
  • MySQL
  • Microsoft SQL Server

You can set your actual database through the menu option Benchmark/Benchmark Options:

via Running a TPC-C workload on SQL Server – SQLServerCentral.