Course Project for Social Sensing

Project Ideas

The project in this course will be open ended. You will propose, carry out, and report upon a project in groups of two or three students. The following are some rough ideas for possible projects centered on social sensing . These are just some examples to help you get started. You can choose from these examples or come up with your own ideas!

Trust and Credibility Analysis. The online social media (e.g., Twitter, Flickr, Facebook, Foursquare, etc.) is designed as an open data-sharing platform for average people. This creates an ideal scenario for unreliable content from a large amount of unvetted human sources. Given the massive amount of twitter users (e.g., 284 million monthly active users) and tweets they make (e.g., half billion tweets per day), it is not simple to figure out the trustworthiness of sources and the credibility of their tweets. Therefore, it would be interesting and important to develop new trust and credibility analysis tools to obtain accurate and credible information from noisy and unfiltered social sensing data.

Disaster Report and Event Tracking. Due to the popularity and penetration of the online social media, people now use them to report the status of disasters and emergency events. For example, in the Boston Marathon Bombing event in April 2013, the first "report" of the bombing event actually came from a tweet made by a witness who was at the scene of the bombing. The timestamp of that particular tweet is the exact moment the first explosion happened. The rich set of social sensing data in the disaster scenarios offers us great opportunities to develop some real-time situation awareness tools that can efficiently detect and track the status of disasters in a reliable and timely fashion. Such tools could greatly assist the government to effectively dispatch rescue team, allocate important resources and get useful feedback from common citizens in the aftermath of a disaster.

Social Media Command Center for Business Intelligence. Large companies (e.g., Dell, Cisco, Wells Fargo) and airlines (e.g., Delta, Southwest) recently start to build a dedicated business intelligence team called social media command center (SMCC). In SMCC, the company's social media team monitor the online social media and engage social conversation around their brand and market. SMCC allows the real-time monitoring of trends regarding marketing efficiency, customer service and feedback, and risk management, making it easy for passing execs to gauge the social health of the brand at a glance. Therefore, it would be an interesting task to build your own version of the social media command center for your favorite brand or company using freely available online social media data.

Real-Time Twitter-based Encyclopedia. Encyclopedias exist to provide information about a wide range of topics. However, in the modern day, encyclopedias are quickly outdated and, in attempting to provide objective information, often gloss over the unique social impact and nuances that each topic possesses. As a solution, one attempt is to construct a real-time encyclopedia using Twitter. In particular, due to the popularity of hashtags on Twitter to associate tweets with particular topics, one can use hashtags as encyclopedia entries. The user will be able to enter a desired hashtag, and on runtime, the program will analyze recent, popular user tweets (perhaps from a chosen geographical region) and construct a current and social sensing definition of the hashtag in question. The created definition will then be presented to the user in a clean and legible manner.

A New Personalized Information Subscription Service. Much like Google News aggregates headlines from relatively reliable news sources (e.g., popular news website) to provide readers a personalized subscription service for news reading, it will be very interesting to develop a new information subscription service that leverages the rich set of real-time information embedded in online social media and explore the collective wisdoms of common individuals. One major challenge to provide this service is how to efficiently distill and organize information contributed by diversified and unreliable sources and summarize such information to an optimized degree that each subscriber feels comfortable to read and trust.

Real-time Data Analytics. Making sense of huge volumes of social sensing data streams coming from a complex and highly dynamic environment in a timely manner is a big challenge. It would be very interesting to build a new data analysis engine that efficiently organizes a firehose of streaming and heterogeneous data feeds and delivers reliable information with real-time guarantees. Some important problems need to be addressed in order to develop this real-time data analysis engine. For example, how can we distribute data streams over clusters and compute results in a way that optimizes the estimation accuracy while minimizing the analysis time? How can we develop an efficient distributed data analysis algorithm that outputs almost the same results as the centralized version but at a much faster speed?

Multi-genre Network Analysis. Comprehensive understandings of multi-genre networks (e.g., social network, information network, and physical network) play a critical role in the future social sensing applications. For example, a recent heavy traffic jam on a major southern California freeway detected by the deployed sensor network (i.e., physical network) co-occurred with unusual bursts of traffic on Twitter (i.e., social network) around the same location. The contents of tweets actually offered a very clear and first-time explanation of the traffic jam as a local protest demonstration for purposes of tax. It would be interesting to develop new techniques that will automatically unearth new information by exploring the data correlation across multi-genre networks and provide more effective solutions for decision makers.

Big Data Processing and Storage. In just one minute, more than 350,000 new tweets are made on Twitter, 700,000 status updates happen on Facebook, more than 3500 images are added on Flickr, and 100 hours of video are uploaded to YouTube. The online social media is creating a deluge of information that greatly exceeds the capability of our humans to consume it. This information deluge motivates an urgent need of big data related techniques to efficiently process and store the data from online social media in an efficient and effecitve way. It would be interesting to develop novel algorithms and schemes that leverage state-of-the-art distributed systems and cloud computing paradigms (e.g., Hadoop, Amazon EC2, etc) to tackle the big data challenge in social sensing.

Detect and Reduce Redundant Information. Given the large amount of data made in social sensing applications, the amount of duplicate content and the demand for the redundant information reduction is increasing tremendously. For example, Twitter users can easily repeat the information from others by using a simple "Retweet" function. Alternatively, some users may rephrase what they have read/learned and make a "new" tweet in a slightly different form. Such redundant information puts a heavy burden on users of micro-blogging services when searching for new content. It would be interesting to develop some duplicate detection and redundacy reduction schemes for social sensing applications that can dramatically reduce various kinds of duplicates and diversify the search results.

Geo-location and Spatial Distribution Problem. Understanding the spatial-temporal distribution of the social sensing data is very important in many real world applications (e.g., disaster tracking, Geotagging, crowdsensing). However, many participants choose to disable geo-location features of their social sensing apps due to the sensitivity of the location data (especially when it is coupled with the temporal information). For example, there are normally less than 1% of tweets that actually have the accurate geo-location information (i.e., GPS coordinates) embedded. Approximately 25% of users have listed a user location as granular as a city name, which also contain non-trivial amount of errors and ambiguities (e.g., confusion about the same city name in different states). Therefore, it would be very interesting to develop some location inference systems that can accurately estimate possible locations of the social sensing data by doing a deeper content analysis (e.g., text mining) in addition to some background knowledge available (e.g., mapping from specific words to given locations).

Assembling Information from Structured and Unstructured Data. Data generated in social sensing can be heterogeneous in modalities (i.e., both structured and unstructured.). For example, structured data could be the numerical readings from the sensors on the participants smartphones. The unstructured data could be a piece of free text or an image that a user uploads to Twitter or Flickr describing the current situation in her/his surroundings. Different tools and techniques have been developed to process and analyze structured and unstructured data respectively. However, it remains a big challenge to explore the correlations across data types and assemble/fuse useful information from both structured and unstructured data. It would be interesting to develop new data processing and inference systems that are capable of assembling information from both structured and unstructured data for our social sensing applications.

Come Up with Your Own. The above examples are only ideas to get you thinking! You are encouraged to come up with your own idea, or modify one of those above.

Milestones

Friday Sep. 3, Noon. - Sign up for the project group and upload a single PDF file containing the Project Title, Abstract and Member List using the Group Signup Form.

Week of Sep. 20. - Sign up your slots on doodle to schedule a project kick-off meeting with the instructor to discuss your project ideas, the resources you might need, and the expected outcome of the project. The instructor will give you feedback to make sure that the project is of appropriate size and level of difficulty. If multiple groups propose substantially similar projects, we may ask you to adjust your work slightly.

Friday Oct. 1, Noon. - Upload a two-page project proposal as a single PDF file to Canvas under Course Project/Project Proposal. The proposal needs to describe an overview of the project (preferably with a diagram), a brief review of the state-of-the-art in the related field, a credible set of initial project results if available, a list of further proposed milestones, and a plan of action for the rest of the semester.

Week of Oct. 11. - Sign up your slots on doodle to meet with the instructors for a mid-term project meeting to give a demo on what you have working so far. At this point, you should have installed (or have access to) the appropriate software and systems, have collected substantial amount of data you need, have developed the first version of your algorithm/system, and have generated some initial results. We will discuss the plan for finishing the remaining parts of the project in a timely way, and make any necessary corrections or adjustments.

Week of Oct. 18. - Each group is responsible for a short mid-term project presentation in class. The presentation will allow the instructor and classmates to comment on the initial results and current state of the project and also give constructive feedback to the group members. Each project partner should speak for a portion of the time. Your talk should be accompanied by a few carefully designed and edited slides.

Friday Oct. 22, Noon. - Upload a four-page project mid-term report as a single PDF file to Canvas under Course Project/Mid-term Report. The mid-term report should include a reasonable amount of preliminary results, a description of finished milestones, a discussion of encountered problems and relevant solutions, and any modifications to the plan (if there are) to finish the remaining tasks.

Week of Dec. 6. - Each group will give a final presentation (including a short Q&A session) on your project.

Monday Dec. 13, Noon. - Turn in your final paper (both the pdf file and source files) and your code to Canvas under Course Project/Final Project Paper.

The final project paper should follow the standard technical paper format. A typical paper of such format contains the following components: abstract, introduction, related work, problem statement, solution, evaluation, discussion/limitation, and conclusion. The paper should be long enough to explain all of the necessary details. That said, anything less than 8 pages is probably too short; anything longer than 10 pages is probably too long. All elements of the paper should be prepared with care and attention to proper English. Please follow the standard IEEE paper format. Here is the template: IEEE Latex or Word Template .

All relevant code should also be turned in, including source code, configuration files, scripts, etc. If you have set up a Github repository for your project, please include the link to that repository in your final report. The code should be complete enough that the grader can build and run your work in the appropriate environment. Please also turn in a README file to include the instructions to run your code/tool. If there are important elements that cannot be turned in as code for whatever reason (e.g. too big or expensive to download) then turn in links, screenshots, or other similar evidence of the completed work.

IS 596 / Project