Friday, 31 May 2013

Web Scraping Tourism Reviews

I’m sure that many tourism business owners have spent a lot of time investigating review sites like Trip Advisor and Yelp, reading up on what their customers are saying. This is good business practice and tourism operators should always have an open ear to any praise or critique.

It is easy to look at reviews for one particular business, but what about at the regional or provincial level? What about comparing reviews across a destination? Which areas are reviewed the most and which are reviewed the least? Do users of Trip Advisor and Yelp leave a path of reviews as they travel or do only the best/worst experiences get mentioned? What is the percentage of positive vs. negative reviews? What is the overall quality of these reviews? These are just some of the questions that I’ve been thinking about recently.

Last fall I started a small research project that ‘scraped’ reviews from Trip Advisor for Nova Scotia. Web scraping is a somewhat controversial technique that actually uses software “agents” to harvest information from websites. In a basic sense, it is an automated version of copying specific information from a web site and pasting it into a spreadsheet. The tool I used to accomplish this is Mozenda. I ended up getting nearly 6,000 total reviews, including user, date, location, star rating /5, and comments. A very rich data source! I did some basic analysis by dividing the reviews up into three categories: accommodation, restaurants, and attractions, and the geolocating them at one of 77 different named destinations. I presented this preliminary material at the 2009 Travel and Tourism Research Association – Canada Chapter annual meeting in Guelph, Ontario. You can take a look at the Slideshare here:


Source: http://geoparticipation.ca/2010/04/30/web-scraping-tourism-reviews/

Restaurant Review: 2 Wives Pizza

Located less than five minutes away from campus, 2 Wives Pizza offers some of the best brick oven pies in town. A new addition to New London, 2 Wives boasts a wild assortment of pizzas, paninis, and starters that many students will soon fall in love with.

Let’s start at the very beginning (a very good place to start): the appetizer. I started off my meal with an order of bruschetta.

Expecting two samples of bread with topping, as seems to be the proper custom at most Italian restaurants, I was pleasantly surprised to find many long strips of warm bread, topped with juicy plum tomato, olive oil and garlic. The topping of the bruschetta was so delicious that I found myself scraping the extra bits which had fallen on the plate with slices of garlic bread.

Next up: the salad. I ordered my perennial favorite, the Caesar. The lettuce was crisp, the dressing was light and delicious and the flakes of cheese were large and strong. I found myself stopping myself from eating too many starters, for fear that I might stuff myself before the entrée arrived.

Finally, the pizzas arrived. My party ordered a number of pies, including the margharita, the grilled vegetable, the BBQ chicken and classic pepperoni. Each slice of pizza was heavy on the flavor, light on grease. The pies are reasonably priced – the margharita, the cheapest pie on the menu, is $6.50 at eight inches.

Looking at the 2 Wives menu, other interesting pies include the great white clam pizza, the Hawaiian honeymoon, and the trio of wild mushrooms. I wouldn’t mind taking another visit to downtown New London to try another hand-tossed pizza.

Although my party was stuffed with abundant dinners, we took a gander at the dessert menu. Desserts include rich tiramisu cake, golden carrot cake (with a hint of pineapple and coconut), and classic cheesecake.

For students 21 and older, 2 Wives also features a diverse wine and beer bar. The overall look of the place – which might resemble a converted warehouse, pipes along the ceiling and all – promises a relaxing atmosphere, where one can grab a slice and read a book.

My only complaint is that the chairs are uncomfortable to sit on for long dinners; I found myself fidgeting frequently during the hours we spent dining.

Besides that, everything about 2 Wives is warm, inviting, and very tasty.

2 Wives is located at 45 Huntington Street, New London.


Source: http://thecollegevoice.org/2009/09/14/restaurant-review-2-wives-pizza/

Restaurants & Bars Contacts Data Scraper

Restaurant Data Scraper tools enables you to scrape the restaurants infomation from online restaurants & bars directory website. A fully customized solution that fits with your business needs.

Restaurant Screen Scraper Features:

- Our screen scraping tools search & discover whole targeted website to scrape restaurant details by entering input parameters like category, contry, price, name and give the list of restaurants data listed on targeted website.
- Collect data fields like : Restaurant Name, Street Address, City, State, Zip Code, User rating, Contact Email, Phone, Fax, Services, Website, Hours, Prices, Payment Accepted Type, User Reviews, images and many more...
- Extract data can be populated in various forms such as Excel spreadsheets, CSV, MySQL, MS-Access, XML, MSSQL, Text & HTML Files
- Download restaurant photos from website.
- Avoid IP Blocks with multiple proxy feature. Scrap anonymously, and without getting blocked.
- Set custom delay between web requests.
- Automatically remove duplicate listings.
- Very useful for table booking business, vendors, menu price comparison service for restaurant
- Easy to use tool | Quick Learning curve and right to the point.
- Requires minimal user inputs.
- Compatible with Microsoft XP/Vista/Windows 7/8

This powerful Restaurants data extracion tool is widely used by many business experts and professionals who wish to gather the data of restaurants, their reviews, item price, and services. Typical data that we extracted is Name, Address, Cross Street, City, State/Province, Post Code, Neighborhoods, Phone, Fax, Website, Category, Number Of Reviews, Star rating, Description, Longitude, Latitude, Price Range, Hours, Accepts Credit Cards, Parking, Good for Kids, Take-out, Waiter Service, Outdoor Seating, Dogs Allowed, Best Nights, Happy Hour, Drive-Thru

Source: http://www.websitescraper.com/restaurants-bars-contacts-scraper.php

Restaurants Review Scraping

Hotels, restaurants, and more; we can harvest the web and normalize the content no matter how divergent the different the sites.

Most Hospitality and Travel businesses are based on providing their clients with the best deals and reviews on demand. They are experts in their respective industries and specialists in semantic technology. The challenge is how to get the immense amount of data from the Internet.

For this data to have any real value not only are the Hotels and Restaurants information required on a globally scale but all of their related reviews, responses and ratings. All of these WebPages are spread throughout the internet and may have millions of pages to traverse. Every site is designed differently as well as containing unique structures and styling. The reviews and responses are all unique pieces of information that need to be extracted and stored individually while still maintaining their relationship to where they came from. They are no standards as to how this type of information is displayed online or the methods used to rate an experience.

The 30 Digits Web Extractor was born to solve issues likes these and many more. The Web Extractor was originally designed to address the problems that the Hospitality and Travel semantic industry encounter. The core emphasis of the Extractor is web data harvesting, being able follow complex links and dig out the exact essential pieces of data. To unearth the relevant core information from a page provides significant time savings as well as boosting content quality. Whether it is navigating links on the page, filling in forms or building links from data on the page the extractor leaps over these obstacles.

These were the founding features for the extractor upon which it has continued to evolve. Making full use of regular expressions matching data patterns becomes child’s play. The easily recognized content and images to the human eye but often elusive to machines has been eliminated.

Different sites have different points of reference or scales; there is no commonality between them. Ratings can be shown as numbers, symbols, and images. Dates can be formatted in numbers and text as well as being affected by the language of a site. All of these pieces of data have to be standardized and conform to have meaning in comparison to each other.

The vast quantity of information all over the Internet pursed by Hospitality and Travel businesses can now be harvested and analyzed for to help customers make more informed decisions. Hotel and Restaurant owners can see how they are viewed and react to the social perception of their products and service.

Source: http://www.30digits.com/review-scraping-hotels-restaurants-hospitality-19.htm