tl;dr: I’m looking for a way to find entries in our database which are missing information, getting that information from a website and adding it to the database entry.
We have a media management program which uses a mySQL table to store the information. When employees download media (video files, images, audio files) and import it into the media manager they are suppose to also copy the description of the media (from the source website) and add it to the description in the Media Manager. However this has not been done for thousands of files.
The file name (eg. file123.mov) is unique and the details page for that file can be accessed by going to a URL on the source website:
website.com/content/file123
The information we want to scrape from that page has an element ID which is always the same.
In my mind the process would be:
- Connect to database and Load table
- Filter:
"format"is"Still Image (JPEG)"- Filter:
"description"is"NULL"- Get first result
- Get
"FILENAME"without extension)- Load the URL: website.com/content/
FILENAME- Copy contents of the element
"description"(on website)- Paste contents into the
"description"(SQL entry)- Get 2nd result
- Rinse and repeat until last result is reached
My question(s) are:
- Is there software that could perform such a task or is this something that would need to be scripted?
- If scripted, what would be the best type of script (eg could I achieve this using AppleScript or would it need to be made in java or php etc.)
I too am not aware of any existing software packages that will do everything you’re looking for. However, Python can connect to your database, make web requests easily, and handle dirty html. Assuming you already have Python installed, you’ll need three packages:
You can install these packages with pip commands or Windows installers. Appropriate instructions are on each site. The whole process won’t take more than 10 minutes.
I’ll warn that I’ve made a good effort to make that code “look right” but I haven’t actually tested it. You’ll need to fill in the private details.