… is extracting data from a single website, or multiple websites and saving it into a local file on your computer.
Have you ever needed to pull out data from from a website?
If the web site provided an API to obtain the information you want, one has to write a program that calls the API functions to obtain the data, and save it into a database. Sometime you want the data on a daily or weekly basis, because it is changing. Then this program has to run automatically every day or week, and update the database.
Once the data is in the database another program is needed for you to use it the way you like, for example, extract it into a CSV file.
Most of the time an API is just not available. In this case one must crawl the web pages and pull out the required data. The web pages are unstructured, so the programming is quite different. It requires fetching the appropriate pages, sometimes following links programatically, and dealing with a few challenges.
- Rate Limitation
Some high visibility websites impose a restriction on how often or how fast you can traverse their web pages, or, repeatedly login and log out.
- Anonymous or multiple logins for access
Sometimes it is necessary to have multiple accounts, if the API limits how much or how fast you can extract data in a certain time period.
- Ajax based websitesAjax technology results in pages or parts of pages, being populated with data at a late stage, like when a button or link is clicked.
Often, when the data is huge, it is paginated, so one has to click the mouse button for the “next page” or “previous page” to see it.
- Changes in the website
When website’s developer changes formatting of the content, the data extraction program can break. A method to detect that has to be put in place.
Need You Need any Data Scraping or Data Mining?
Phase I: Evaluation
- The process by which one determines whether data scraping is possible or not.
Phase II: Implementation
- Writing programs to scrape the data, and dealing with the above challenges
- Implementing timers to extract data on a periodic basis if required
- Deciding how you will want delivery of the data
Request an Evaluation
For a small fee we can perform Phase I for your needs. Provide with me with as much information as you can in the form below, and we will get back to you with an assessment.
Request Programming Minds, Inc to perform an a data scraping evaluation of the following data:
From the following site:
Is the data to be extracted one time or periodically?