Big Data and Social Media
Advancements in technology have put smartphones, tablets, and computers in the hands of many people around the world. Each day, people use these devices to communicate, schedule appointments, share information and moments with family, as well as do business. Each day , therefore, millions of people generate data through the devices, thanks to fast internet connectivity in most parts of the world through wireless and cable internet connections. Chief to the unprecedented rise in the generation of information by these individuals is the advent of social media platforms through which individuals share moments with friends within their social networks. Among the most popular of these social media, platforms include Facebook, Twitter, Google+, Tinder, Pintrest, Instagram, Whatsapp, and You Tube. While most of these social media platforms were largely based on the internet, the proliferation of smartphones and tablets with supporting ecosystems, such as Android, iOS, and Windows Phone, have necessitated the creation of social media apps, which are native to the smartphones. With these apps, organizations are able to collect information from the users, and use this information for targeted advertisements, improve the supply chains, and make savings on their business processes (Palmer 2014, n.p.). The information generated from the social media a platform is referred to as Big Data; a colossal amount of unprocessed data generated by the increased human digital footprint (Oxford 2014, n.p.). The interaction between social media and Big Data however occurs through two important elements: the human and non-human elements. These two form the intersection through which humans and technology meet to generate data. The amount of data generated through this interaction is huge; about 400 million tweets are posted on Twitter daily, 350 million photos posted on Facebook, while about 4 billion videos are viewed on You Tube (Parliamentary Office of Science and Technology 2014, p. 1). These three platforms the widely used forms of social media, with Whatsapp and Google+ following closely behind. As the biggest social media platform, Facebook generates some of the largest amounts of data daily through content shared on its apps and online websites from smartphones, tablets, and computers. This essay will focus on Facebook in relation to Big Data; it will also identify and examine the human and non-human elements of the social media platform with the desire to get a deeper understanding of the interaction between these elements and Big Data, as well as the impact of these technologies to social life and culture.
The term “Big Data” has varied definitions depending on the person or people defining it. A clear understanding of Big Data follows a three-part elemental combination entailing technology, analysis and mythology. To these therefore, Big Data involves the mixture of computational power and algorithmic precision in the gathering, analysis, linking and comparing large sets of data; this ties the understanding of Big Data to technology (Viz 2013, p. 3). In attaching the understanding of Big Data to analysis, Viz (2013) looks at Big Data as using large data sets for the identification of patterns as a means of making economic, social, technical, and legal claims (p. 3). Finally, in relating Big Data to mythology, the understanding here views Big Data as the notion that through large data sets, it is possible to get better intelligence and knowledge important in the generation of insights, which were previously elusive (Viz 2013, p. 3).
Within the current information centric world, individuals, companies and organizations generate data every day. Many organizations have seen the importance of the vast amounts of data generated and are actively working towards processing this data for their own benefit (Eaton et al. 2012, p. 4). Social media specifically, generates tons of information every second, some of which organizations are more inclined to use for marketing and other purposes. Other organizations are however overwhelmed by the huge amounts of data generated by not only their customers, but themselves as well. This has created a problem, where these organizations cannot move fast enough in the processing of all this data. Part of the reason for the exponential increase of data generated by individuals and companies is the advent of instrumentations wherein people, with their connected devices, are able to see and sense a lot, and in so sensing, endeavor to store these sensations (Eaton et al. 2012, p. 4). At the helm of generation of the data are therefore the interconnected devices used every day and all the time, in addition to the placement of intelligence in most of these gadgets, which allow them to communicate with one another, as well as store information received and exiting.
While there was concern over the amount of data coming in, and the fact that humans, given these interconnected devices, will drown in data, Big Data seems to have solved the problem through the creation of enterprises around Big Data (Fortune 2014, p. 2). The creation of these enterprises is a direct consequence of one of the characteristics of Big Data—volume. According to Eaton et al. (2012), Big Data has three characteristics that include volume, variety and velocity (p. 5). It is estimated that around 800,000 petabytes of data was stored at the beginning of the millennium around the world. By 2020, it is estimated that about 40 zettabytes of data will be generated (Cote 2013, p. 9). Most of this data comes from social media, particularly Facebook and Twitter, which combined generate more than 20 terabytes of data every day (Eaton et al. 2012, p. 5).
The volume of data generated as a characteristic of Big Data, while problematic, presents yet another problem and characteristic of Big Data: variety. The interconnectedness of different systems and devices, installation of sensors, the explosion of online activity and different technologies has brought with it a variety of data. This data today therefore includes traditional relational data, raw, semi-structured and unstructured data from online activity (Eaton et al. 2012, p. 7). With such variety in the data streams therefore, traditional relational technologies find it difficult to make sense of this data, given that the data generated does not align with the traditional database technologies, largely made to make sense of complexity in related data (Fortune 2014, p. 2; Madden 2012, 4). Therefore, Big Data moves away from the traditional structured data to the raw data, using this data for decision-making and gaining insight into the patterns of humans and the society from which this data is generated.
The last of the characteristics of Big Data is the velocity of the data. This is the consideration of how fast data is generated, how fast it is stored and how fast it is analyzed and used for decision-making. According to Eaton et al. (2012), data generated today is so fast in its creation and has an even shorter shelf life. This constant and fast generation of data presents a problem to enterprises, which are therefore, required that through, Big Data, they should process the data while in transit to make any beneficial use of it.
Social media is one of the biggest technological advancements today, allowing communication, sharing of information, pictures and video among users of particular social media platforms within their network of friends. Social media therefore is a website, a mobile app or a web tool that not only gives its users information, but also allows the users to interact with the information. The interaction herein can include giving comments on an article, sharing of the article or voting for the article. While these form the simplistic side of social media, other intricate elements of social media include movie, music or book recommendations based on ratings from individuals with similar interests and retweeting of individual comments. Currently, there are numerous social media platforms available for users where they can share photos, send tweets, post and watch videos as well as use the connections for professional networking and recommendations. The popularity, use and penetration of social media across the world are perhaps visible through the numbers relating to social media platforms.
Social media use goes beyond communication among friends and families to marketing. Many organizations have taken to social media as a marketing platform, given the time people spend with the phones on social media networks. According to the 2014 Social Media Marketing Industry Report, about 97 percent of marketers agreed to having used social media for marketing, while 92 percent are in agreement over the importance of social media for business (Stelzner 2014, p. 7). So popular and practical are social media platforms that some organizations are replacing the traditional websites with social media pages (Goodfellow 2014). The popularity and widespread use of social media platforms for these organizations stems from the ease of use of the platforms, the fact that they are free and therefore save organizations marketing money, as well as the platforms’ penetration and popularity among users, who these organizations see as potential customers (Goodfellow 2014).
Facebook runs Custom Audiences as a marketing tool allowing organizations to advertise to their customers through Facebook. The first step is creating a list of potential customers with same age group the organization wants to target. Organizations can then locate the profiles of the individuals with the target age, and then add the ID to a list, which then allows organizations to target these individuals. Moreover, organizations can also build the list from individuals who visit their websites or use their mobile apps or partner apps with the target age and through this list, enable Facebook deliver ads to the individuals. Custom Audiences is additionally instrumental in consumer behavior analysis as it allows organizations to predict the consumer behavior using the consumers’ online footprint.
Atlas is Facebook’s elemental platform that allows advertisers to measure Facebook’s effectiveness as an advertising platform in comparison to other messaging platforms. The model therefore provides advertisers with audience-based targeting and cross-device measurement capabilities allowing the advertisers to correctly measure and attribute users’ conversion. LiveRail on the other hand, accords publishers the comfort of leveraging video ads as advertisement options, while Audience Network allows advertisers to target their customers on other networks such as Google+, Twitter, Instagram or LinkedIn, using the same information the customers have on Facebook.
Social media currently drives the bulk of internet traffic; in fact, social media according to many is the very soul of the internet today (Abdel-Hafez and Xu 2013, p. 59). Given this fact, the human element remains an important part of social media, so much that its absence indeed means the death of social media. Given the human element, many organizations are using social media hashtags, especially on Facebook for marketing. By creating a unique URL for hashtags, organizations are able to point people to conversations about their products, and through the sentiments shared by the people make product improvements, changes or introduce new products.
Social media is a human-centric tool that relies on the human touch for proper functioning. For Facebook, for example, the status updates, shares and likes all require a human touch for their effectiveness. Lawlor (2014) puts this clearly, stating that two-way communication is the foundation of social media. Personalized content, messages and posts therefore are among the very human elements of social media. It is because of the personal touch that social media indeed becomes social. Through these social elements of communication, humans generate social data.
Part of the existence of social media is to encourage communication and interaction between and among people. Indeed, social media, and Facebook in particular, has changed the traditional communication landscape, empowering grassroots people to play active roles in economic, social and political spheres of life (Zhao et al. 2013, p. 50). According to Abdel-Hafez and Xu (2013), millions of people, with the potential of exponential growth, use Facebook, as a social media platform (p. 60). These interactions have so far been facilitated by the presence of many cheaply available devices such as smartphones, feature phones, tablets and computers, on which different social media platforms can be accessed. These devices then become storage facilities as well as links to other systems, which then collect the data generated by the users for storage and analysis as well as advertising and prediction of behavior.
Many of the political parties and movements use social media conversations in following and analyzing social media sentiments of the users towards the political party/movement or candidate. The sentiments presented are largely human, however, using Big Data analytics, these political parties are able to discover areas of little support for their cause and therefore concentrate resources towards these areas. This was particularly important in the 2012 US presidential elections, where the Obama team used Facebook and Twitter sentiments to map out swing voters for the purpose of their campaigns (The Parliamentary Office of Science and Technology 2014, p. 2). Additionally, using large amounts of data sets, the Obama campaign team generated indexes that helped them determine less likely and most likely voters. With such indexes, the campaigners were able to focus their efforts on most likely voters, leaving out those less likely to vote. The data sets were also instrumental in mapping out Democratic voters in traditional Republican states, convincing them to vote, which eventually gave president Obama victory over Romney (Tufekci 2014, p. 9)
Thus, many organizations as well as political parties are more inclined to use user information for their own benefit, combining psychographics with individual profiles from social media such as Facebook in marketing and political campaigns (Tufekci 2014, p. 21). Both the 2008 and 2012 campaigns used individual modelling through information on users’ social media sites, and through big data analytics and computational methods, identified and targeted voters with fears and vulnerabilities to reach out to them for campaign (Tufekci 2014, p. 21). These users likely did not willingly volunteer this information, but through social media and big data computational analytics, the social media platforms shared this information to the interested third parties.
Additional political and economic issues that emerge with social media use include copyright infringement/intellectual property. Thus, although many users post content such as pictures, stories and works of art on Facebook, it is possible for others to take the content and use it without necessarily giving credit to the original content producer, or benefit from the content. A case in mind is Morel, whose Haiti earthquake photos were used by Agence France-Presse (AFP) and Getty without prior consent of Morel.
Although many see social media as revolutionary in facilitating communication, there are many reservations over the infringement of privacy through social media. Political campaigners, use correlational studies of political choices and other attributes, by collecting data from social media, campaigners can deliver individualized messages to the target individuals, and in so doing sway them towards their political ideologies (Tufekci 2014, p. 4). This is largely unethical, as search history on Google or Facebook are largely private. Further, by sifting through user information on user profiles and posts made by the users, Facebook has been able to compile comprehensive data profiles on its users and therefore sell this data to corporations for marketing purposes, and government agencies and law enforcement for racial profiling among other uses (Richards & King 2014, p.15). Most of this information is largely private, and by accessing and using this information, social media platforms and search engines infringe on users’ privacy, which is evidently unethical.
While the personal touch, messages, likes and shares on Facebook, form the very core of the social media platform, there are other underlying non-human elements, that although may not be necessarily visible to the social media’s users, form an important part of the assemblage. Embedded within social media platforms are technologies that help in enhancing user experience, while at the same time learning the users’ likes and preferences as a means of improving the user experience and interface.
As a means of enriching the user experience, one of Facebook’s technologies is the geolocation feature that allows the application or platform, using GPS technology on the smartphone, computer or tablet to pinpoint the exact location of the user (Profis 2011). The geolocation feature is part of the platforms way of enriching the user experience as through this, it can suggest nearby locations and events, which may be of interest to the user. Part of Facebook’s geolocation technology also includes the Nearby Friends feature, which notifies users of the proximity of any friend within their friends list. This is in addition to broadcasting a user’s location so that friends within the user’s proximity can meet up with the user.
Embedded within social media are algorithms, which help in the deployment of automated and predetermined selection instrumentation that help in the establishment of relevancy. This is as seen on Facebook ‘News Feed,’ which displays to its user the Most Recent and Top New on the user’s landing page according to users’ friends and pages that users have a relation (Bucher 2012, p.1167). The working here therefore is that each interaction of the user with the information on the News Feed is weighted through the creation of an Edge. Edges are then ranked according to their affinity (frequency of checking a user’s profile or sending a private message), weight (thus a Comment has more weight than a Like) and time (wherein the most recent activity (like or comment) is more important that an old one) (Bucher 2012, p.1167). Additionally, Facebook uses metadata to summarize basic information on data on its site and other non-Facebook sites. Metadata therefore provides information on a particular item on a page in terms of description, time, date and location of the item such as an image, a comment or a message. Facebook Open Graph, is the social media’s metadata technology, and provides descriptions on the content of a page as well as keywords linked to the content. For Facebook Open Graph, this information may include images, largely displayed on search engines’ search results. This way, users are able to gain interest in the pages and click on them. On the downside however, metadata exposes more information about the individual who has posted the item. The ads on Facebook and other social media sites also use metadata in attracting users to the pages, as well as for the collection of information on the users for other purposes.
Big Data and Social Media
The rate and volume of unstructured data produced on social media creates a problem with the analysis of the data using traditional methods, which rely on the use of structured sets of data. Databases, which are among the most commonly used traditional methods of data analysis, have proven incapable of handling such huge volumes of unstructured data (The Parliamentary Office of Science and Technology 2014, p. 2). While the large volumes of data generated by sensors, industrial and domestic networks among other things is what is collectively referred to as Big Data, narrowing this down to data generated by the digital human becomes Big Social Data (BSD) (Cote 2013, p. 9). Specifically therefore, BSD is generated from every human communication actions using devices for social interactions or making purchases (Cote 2013, p. 9).
Among the most popular social media platforms are Facebook, with more than 1 billion active users, Twitter with more than 700 million active users, LinkedIn with more than 500 million users among others such as You Tube, Instangram, Pintrest and Google+. Facebook therefore remains the most popular social media platform with more than 71 percent of adult males in the US attesting to using the platform according to a survey by Pew Research (Duggan et al. 2015). LinkedIn and Pintrest, according to the Research, had 28 percent of the US adults using the platform, while Instangram had 26 percent of the adults, in contrast with 23 percent for Twitter (Duggan et al. 2015). While these demographics are encouraging, especially to corporations dealing with big data, they present a problem as not every person whose details these enterprises may want has an account on any of the social media platforms. It is therefore a challenge getting complete information on individuals, given that some individuals are also vividly absent from any social media platform.
The amount of data generated by the digital human continues to increase as technology advances and more people get access to devices that aid in the generation of BSD. A few years ago, Google, being the most visited site, processed about 20 petabytes of data each day. With one billion users sharing more than a billion pieces of content daily, liking 300 billion things and uploading 300 million photos, the social media alone generates 500 terabyte of social data every day (Cote 2013, p. 9).
Processing this information has proven difficult for the traditional relational databases, which largely processed data in predetermined schemas of rows and columns. Thus, the data processed by these databases are clearly defined in relation to one another (Cote 2013, p. 9). BSD, as generated by humans, on the other hand is largely unstructured, and follows only human communication within the realm of cultural meaning (Cote 2013, p. 9). Cote (2013) puts it clearly indicating that data generated by user is unstructured, non-conforming to any pre-meditated schema. This data is largely symbolic and spontaneous; a true reflection of human free communicative sociality (p. 11).
To gain any meaningful use of this data, it is important to process it outside the confines of traditional relational technologies (Madden 2012, p.4). Even with this however, there are concerns that this data does not provide a complete overview of the population, given that only a handful of people use social media (The Parliamentary Office of Science and Technology 2014, p. 2). The argument here is that this data leaves out some vulnerable groups such as the elderly and those from lower income backgrounds.
That aside however, BSD shows great potential in its use, with evidence showing it has the potential of doing even more. The 2012 US Elections saw in depth use of BSD, with the news site Politico using sentiment analysis from Facebook as a complement to traditional polling methods. By allowing users to air their views easily and freely, Facebook allows users to take part in the political process, and in this case as a complement to traditional polling, which prove more enriching than the use of traditional polling alone (The Parliamentary Office of Science and Technology 2014, p. 2).
Analytics within BSD, apart from polling have also been instrumental in campaigning, as was the case in the 2012 US elections. According to the Parliamentary Office of Science and Technology (2014), using data from Facebook and other social media platforms, analytics of BSD were able to profile information on individuals likely to vote for Obama (p. 3). This way, the campaigners were able to target their resources more effectively. Such analytics have also been employed extensively in marketing and advertising, where social media platforms display adds based on users’ likes and preferences, as well as make suggestions on products and services that users’ friends have a liking for (Ferguson 2014, p. 1).
Big Data is a recent phenomenon, whose depths have not yet been fully explored. Social media has eased communication between and among people. Through communication on social media, people generate colossal amounts of data, most of which is unstructured. This has brought with it the Big Data phenomenon, with BSD being specifically related to social media. So far, processing BSD has proven difficult for traditional data analytics. However, modern analytics have developed better capacity to process this data, and while they may not fully process the data coming in at high volumes, variety and velocity, they have made headway in processing the data, and have so far began using the data in polling, campaigning marketing and credit scoring. Big Data remains fully unexploited, and with increased use of social media, BSD too remains unexploited. The advent of Big Data has however brought with it concerns over its use on social media. These have included issues of privacy infringement, sale of user data to third parties and using user information for marketing. Although different organisations have so far made headway in exploiting this data, the speed, volume and capacity at which this data is generated requires more advanced analytic systems if this data is to be of help to not only corporations, but individuals as well.
Abdel-Hafez, A. and Xu, Y. 2013.“A Survey of User Modelling in Social Media Websites.”Computer Information Science, vol. 6, no. 4, pp. 59-73
Ahlqvist, T. et al. 2008. Social Media Roadmaps. Helsinki: Edita Prima Oy
Bucher, T. 2012. “Want to be on the top? Algorithmic power and the threat of invisibility on Facebook.” News Media & Society, vol. 14, no. 7, pp. 1164-1180
Cote, M. 2013. Data Motility: The Materiality of Big Social Data. Cultural Studies Review
Duggan, M. et al. 2015.Social Media Update 2014. Pew Research Center. Available from http://www.pewinternet.org/2015/01/09/social-media-update-2014/
Eaton, P. C. et al. 2012. Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data. New York: McGraw Hill
Ferguson, C. J et al. 2014. “Concurrent and prospective analyses of peer, television and social media influences on body dissatisfaction, eating disorder symptoms and life satisfaction in adolescent girls.” Journal of Youth Adolescence, vol. 43, pp. 1-14
Fortune, S. 2014. A Brief History of Databases. Avant
Goodfellow, C. 2014. “Can social media platforms replace a business website?” The Guardian. Available from http://www.theguardian.com/small-business-network/2014/oct/06/social-media-platforms-replace-small-business-websites
Guimaraex, T. 2014. Revealed: The Demographic Trends for every Social Network.Business Insider. Available from http://www.businessinsider.com/2014-social-media-demographics-update-2014-9
Kietzmann, J. H., et al. 2011. “Social media? Get serious! Understanding the functional building blocks of social media.” Business Horizons, vol. 54, no. 3, pp. 241-25
Madden, S. 2012. From Databases to Big Data. IEEE Internet Computing
Oxford, T. 2014. “Big Data in Social Media: Social Mining Part 2: Finding insight.” Useful Social Media. Available from http://usefulsocialmedia.com/customer-insight/big-data-social-media-social-mining-part-2-finding-insight
Palmer, M. 2014. “Social medial and big data come into play.” Financial Times. Available from http://www.ft.com/intl/cms/s/0/05a51650-f316-11e3-a3f8-00144feabdc0.html#axzz3X5I7N1Xi
Parliamentary Office of Science and Technology.2014. Social Media and Big Data.London: Parliamentary Office of Science and Technology
Preston, J. 2011. “Movement Began with Outrage and a Facebook Page that Gave it an Outlet.” The New York Times. Available from http://www.nytimes.com/2011/02/06/world/middleeast/06face.html?_r=0
Profis, S. 2011. How to stop Facebook from sharing your location. CNET. Available from http://www.cnet.com/how-to/how-to-stop-facebook-from-sharing-your-location/
Richards, N. M. & King, J. H. 2014. Big Data Ethics. Wake Forest Law Review, pp. 1-49
Saleh, I. 2012. “Egypt’s digital activism and the Dictator’s Dilemma: An evaluation.” Telecommunications Policy
Stelzner, M. A. 2014. 2014 Social Media Marketing Industry Report. Social Media Examiner
Tufekci, Z. 2014. “Engineering the public: Big data, surveillance and computational politics.” First Monday, vol. 19, no. 7, pp. 1-20
Viz, F. 2013. “A critical reflection on Big Data: Considering APIs, researchers and tools as data makers.” First Monday, vol. 18, no. 10, pp. 1-12
Zhao, J. J. 2013. “Strategic use of social media on companies’ e-commerce sites.” Journal of Research in Business Education, vol. lv, no. 2, pp. 50-70