GitHub - bbiletskyy/cv-search: Distributed CV-mining

CV-Search

Owerview

CV-search application searches the corpus of cv's (actually any *.pdf, -doc or -txt document) using another corpus of documents as a query. This code can be used as a tutorial for Text Mining and Natural Language Processing with Spark Machine Learning Library.

How to run:

Put documents to the data/corpus folder (there are some examples there already)
Put documents to the data/query folder (there are some examples there already)
Execute in the command line: sbt run
See results in the data/results folder

Promblem statement

Assume we have a bunch of CV's in different formats which we want to search.

There are several ways we can compose search queries. One can search the text in cv's for keywords, like "java", "teamlead", "scrum". This approach is not very handy, since there are not so many combinations of keywords you can come up with, not enough to describe a much reacher set of sv's.

Another approach is to parse cv's and to extract important fields to the database, then to query for. This approach is very complex due to the parsing and data extraction parts, thsi approach also requires from user the knowledge of a query language, such as SQL.

What if we could query our cv's-base by specifying a bunch of example cv's, to look for similar ones. This application sorts cv's in *.pdf, doc, txt formats from the "corpus" folder in order of their relevance to the cv's in the "query" folder. In other words you query by examples instead of keywords.

Uses Tika parser and Spark's MLlib TF-IDF statistics implementation in order to sort a set of documents according to their similarity to the query qocuments.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data		data
project		project
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CV-Search

Owerview

How to run:

Promblem statement

See more:

About

Releases

Packages

Languages

License

bbiletskyy/cv-search

Folders and files

Latest commit

History

Repository files navigation

CV-Search

Owerview

How to run:

Promblem statement

See more:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages