Web-Crawler

web-crawler built in nodejs

Description

A web-crawler built to store all the hyperlinks encountered on a website and crawl through to fetch subsequent one. Stores data for every unique url, the query parameters associated with a pathname and occurance of the url for given iterations.

Application is integrated with mongodb cloud for storging data. Hence, local dependency for database is not needed.For more information, refer to: https://docs.atlas.mongodb.com/

The configuration for the crawler can be changed in config.js. The default config is as:

baseUrl: The starting url(https://medium.com), totalIterations: total iterations for web crawler(25), promiseTimeout: timeout for promise for url miss(5000), requestTimeout: timeout for every request(10000), requestConcurrency: max promises to be executed in paraller(5), mongodbConnectionString: mongoDB connection string, mongodbCollection: MongoDB collection(LinkCollection), mongodbDatabase: MongoDB database(LinkDatabase),

Demo

There is not an online demo, so if you want to take a peek into how the app will look like once it's done till the end, you will need to do the following:

Install Node.js (with npm)
Clone this repository
Run npm install in the project directory
Run npm run start to begin web crawler.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.eslintrc		.eslintrc
.gitignore		.gitignore
README.md		README.md
config.js		config.js
index.js		index.js
package.json		package.json
utils.js		utils.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web-Crawler

Description

Demo

About

Uh oh!

Releases

Packages

Languages

AlphaV7/Web-Crawler

Folders and files

Latest commit

History

Repository files navigation

Web-Crawler

Description

Demo

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages