Skip to content

Ellery Yang

I write about products and the PM role.

Menu
  • Home
  • About me
Menu

Introducing EllerySearch

Posted on July 20, 2015April 25, 2020 by Ellery

 

I have been curious about the way search engines crawl the internet. Recently, I revisited materials on this topic covered in my Intro to CS course at McGill. With the spirit of learn by doing, I went to get my hand dirty and made a lightweight search engine that crawls a certain website. You can check it out here.

EllerySearch HomePage

Some basic information on this search engine: it runs on JRE 7 and Tomcat 7; it is configured to search only within a certain domain; it supports only one-word searching; no database backs it up at this moment. By default, it is configured to crawl links within elleryyang.com, so if you want to find out in which posts did I talk about Azure, simply type computer in the search box, and click search.

Search results for keyword "computer"

The results page will list all the results in order of PageRank. Each result entry contains a keyword, a link and a PageRank value for user’s reference. Results are restrained in domain elleryyang.com: external links such as my LinkedIn will not be crawled even though they are indeed detected by my spider.

In order to configure the spider to crawl a different site, I go to crawl-reset.jsp and put a new site and a starting point, as well as my admin key in there. If you would like to test the spider, feel free to request an admin key. Here I will crawl my McGill SOCS homepage.

Crawl page

The spider will now crawl all links within domain cs.mcgill.ca/~yyang121 from cs.mcgill.ca/~yyang121. If the crawl is successful, the search engine will be updated to search in cs.mcgill.ca/~yyang121 domain, and I will get a successful message.

Crawl results

Now if I do a query on some keyword, the result links will be from cs.mcgill.ca/~yyang121.

Results after re-crawl

This project still needs extra work, but feel free to try it out and give feedback!

PM Blog (2017 - present)

  • February 2022
  • September 2021
  • August 2021
  • April 2021
  • January 2021
  • December 2020
  • November 2020
  • October 2020
  • September 2020
  • August 2020
  • July 2020
  • June 2020
  • May 2020
  • April 2020
  • February 2020
  • January 2020
  • December 2019
  • October 2019
  • July 2019
  • June 2019
  • April 2019
  • March 2019
  • December 2018
  • October 2018
  • May 2018
  • March 2018
  • December 2017

A Student's Blog (2015 - 2017)

  • September 2017
  • July 2017
  • August 2016
  • July 2016
  • April 2016
  • March 2016
  • February 2016
  • December 2015
  • October 2015
  • September 2015
  • August 2015
  • July 2015
© 2022 Ellery Yang | Powered by Minimalist Blog WordPress Theme