If you are working in the HR software industry or a member of an HR department, you definitely have thought about structuring your CV data before. Projects requiring resume structuring can be crucial to your business and can range from simple user experience improvement to strategic product roadmap advancements. These are some examples of the common use cases of CV structured data in the HR Software Industry:

  • Creating an efficient and relevant talent search experience
  • Getting market-relevant insights about your talent pools
  • Building usable datasets for an AI based job matching tool.

Resume Parsing, the inevitable solution to your problem

copyrights Riminder 2018

A resume parsing solution is a software that takes a resume as an input that can be in any media format (PDF, Word or image) or template, then convert it into a structured data format like — such as XML or JSON.

The information that is extracted by a resume parser usually includes the following:

personal information: name, email, address, phone
list of experience: start date, end date, location, job title, company, description, …
list of education: start date, end date, location, degree, university …
list of skills, …
list of interests

Seems easy? but reality is hard!
No improvement for more +10 years

Here are some few metrics:
+1.4 Billion resumes are parsed every year.
+40% of resumes have a complex layout (multi-column,etc.)
+7% of resumes are either scans or images

The first resume parsers were born in the late 90's to provide a data structuring technology to HR software companies that are looking for a stand-alone packaged solution in order to focus on their core business. Some of these first mover solutions are:

  • Sovren (1996)
  • TextKernel (2001)
  • Daxtra (2002)

How Daxtra, Sovren, Hireability, Textkernel and Segmentr (by Riminder) are doing at this task?

Building a general and reliable parser requiers many different blocks.
For instance, the system should be able to handle:

  • complex layouts (ex: multi-column resumes, pictures with backgrounds, etc.)
  • ambiguous entities (ex: Facebook, as a former employer vs. a social media skill)
  • different media formats (PDF, Word, Image, etc.)
  • multiple languages
  • etc.

The following comparison between some of Segmentr’s features and famous existing resume parsing solutions is the result of extensive validation tests we led at Riminder:

Features benchmark

Segmentr (by Riminder) is the only Resume Parser able to handle such examples

Left: High density one-column resume Center: Complex layout resume Right: Picture of a multi-column resume with combined background, shadow and distorsion

We’ve also computed the performance of each solution over a validation dataset of around 100 resumes randomly sampled. For each output, we averaged the accuracy obtained across the multiple labels. Below is a graph summarizing the obtained results:

Extracted information accuracy overview

Segmentr example in python

First you have to post your data using a POST REQUEST on following the endpoint bellow (full documentation: https://developers.riminder.net/v1.0/reference#profile)

python code

Python Example of how to post a resume on Segmentr (by riminder)

Here is the structure of the data that you’ll get:

What’s next for you?

Discover Segmentr Live
If you are interested to know more about Segmentr, you can book us for demo : http://docs.riminder.net/segmentr .

Are you a Developer?
You can start now using our self-service API without any painful setups.
Get started in few minutes with our documentation: