I have a large database of resumes (CV), and a certain table skills grouping

Question

0

Editorial Team

Asked: May 14, 20262026-05-14T05:36:21+00:00 2026-05-14T05:36:21+00:00

I have a large database of resumes (CV), and a certain table skills grouping

0

I have a large database of resumes (CV), and a certain table skills grouping all users skills.

inside that table there’s a field skill_text that describes the skill in full text.

I’m looking for an algorithm/software/method to extract significant terms/phrases from that table in order to build a new table with standarized skills..

Here are some examples skills extracted from the DB :

Sectoral and competitive analysis
Business Development (incl. in international settings)
Specific structure and road design software – Microstation, Macao, AutoCAD (basic knowledge)
Creative work (Photoshop, In-Design, Illustrator)
checking and reporting back on campaign progress
organising and attending events and exhibitions
Development : Aptana Studio, PHP, HTML, CSS, JavaScript, SQL, AJAX
Discipline: One to one marketing, E-marketing (SEO & SEA, display, emailing, affiliate program) Mix marketing, Viral Marketing, Social network marketing.

The output shoud be something like :

Sectoral and competitive analysis
Business Development
Specific structure and road design software –
Macao
AutoCAD
Photoshop
In-Design
Illustrator
organising events
Development
Aptana Studio
PHP
HTML
CSS
JavaScript
SQL
AJAX
Mix marketing
Viral Marketing
Social network marketing
emailing
SEO
One to one marketing

As you see only skills remains no other representation text.

I know this is possible using text mining technics but how to do it ?
the database is realy large.. it’s a good thing because we can calculate text frequency and decide if it’s a real skill or just meaningless text…
The big problem is .. how to determin that “blablabla” is a skill ?

Edit :
please don’t tell me to use standard things like a text tokinzer, or regex .. because users input skills in a very arbitrary way !!

thanks

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-14T05:36:21+00:00

If I was doing this programmatically I would:

Extract all punctuation delimited data (or perhaps just brackets and commas) into a new table (with no primary key, just skill) so Creative work (Photoshop, In-Design, Illustrator) becomes

 Skill            
 -------------
 Creative work    
 Photoshop        
 In-Design        
 Illustrator

Then, after you’ve proceed all CVs, query for the most common skills (this is MySQL)

SELECT skill, COUNT(1) cnt FROM newTable GROUP BY skill ORDER BY cnt DESC;

Which may look like this contrived example

 Skill            Cnt
 ---------------------
 Photoshop        3293
 Illustrator      2134
 Creative work     932
 In-Design         123

Then you decide, from the top X skills, which you want to capture, which must map to other skills (Indesign and In-design should map to the same skill, for example) and which to discard, then script the process using a data map.

Use the data map to write a new word frequency table (this time skill_id, skill, frequency) and the second time when parsing the data also write to a lookup table (cv_id,skill_id). Your data will then be in a state where each CV is mapped to a number of skills, and each skill to a number of CVs. You can query for the most popular skills, CVs matching certain criteria etc.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a large database of resumes (CV), and a certain table skills grouping

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply