Distributed System and Algorithm

Date:

Scalable all-pairs similarity search in metric space on distributed system;
Building Language Model based on Sougou Dataset.

Main Work

  • Building rocks-cluster on 20 raw machines and configuring Hadoop and Spark.
  • Reproducing a paper, “Scalable All-Pairs Similarity Search in Metric Spaces”, and using Hierachical Clustering Method to compress the result.
  • Building a language model based on Sougou Dataset, and using smooth method to optimize it.

Technology

  • Language: Java
  • Rocks-Cluster
  • Hadoop && Spark

Link

Source Code
Report