- Status Closed
- Percent Complete
- Task Type Feature Request
- Category Backend / Core → Import
- Assigned To No-one
- Operating System All
- Severity Low
- Priority Very Low
- Reported Version 2.33
- Due in Version Undecided
-
Due Date
Undecided
- Votes
- Private
FS#167 - Image import/processing in parallel
For those of us with multiprocessor/multicore systems it would be really nice if PO could process more than one image at the same time.
It would save a lot of time when importing large batches of images.
Perhaps the simplest way to do this would be to rewrite the import loop to "submit" images to the backend, instead of processing them in said loop.
Closed by pizza
2009-03-04 14:33
Reason for closing: Implemented
Additional comments about closing: I've imported several thousand images with this code, and it seems pretty damn solid now.
2009-03-04 14:33
Reason for closing: Implemented
Additional comments about closing: I've imported several thousand images with this code, and it seems pretty damn solid now.
Loading...
Available keyboard shortcuts
- Alt + ⇧ Shift + l Login Dialog / Logout
- Alt + ⇧ Shift + a Add new task
- Alt + ⇧ Shift + m My searches
- Alt + ⇧ Shift + t focus taskid search
Tasklist
- o open selected task
- j move cursor down
- k move cursor up
Task Details
- n Next task
- p Previous task
- Alt + ⇧ Shift + e ↵ Enter Edit this task
- Alt + ⇧ Shift + w watch task
- Alt + ⇧ Shift + y Close Task
Task Editing
- Alt + ⇧ Shift + s save task
If we go to a "bulk upload tool" it would be able to submit as many images as it wants at once. So perhaps the real solution is to implemnt the necessary bits for a bulk upload tool?
Everything an image needs to be successfully imported is now contained within the $image_data[] structure. We could theoretically serialize this structure, stack up a bunch, and then process them in parallel until we're done.
Too bad PHP sucks for job control.
I've been consolidating the import code -- now photo/version imports use the same function, and all per-file stuff (eg orientation/colorspace) is now stored on a per-file basis in the image_data structure. It's now possible to put all files for a photo into this structure, and import photo+allversions in a single batch.
All that's left in this phase is to modify the 'directory upload' code such that it assembles a single struct for all versions then kicks off the import for all versions at once.
Okay, the upload code is now fully serialized. All "import requests" are now written into the database, and a separate function pulls entries off the queue and imports them.
As soon as I figure out how to perform proper job/process control, we'll be able to have multiple workers in parallel.
we're going to have to have the workers running out of band of the web server; ie as a "daemon" of sorts.
There's still some ugliness dealing with cleaning up temporary directories.
OOB daemon created. Still need a startup script, documentation, and other misc integration.
Also need the OOB "import results" channel created somehow.
Import loop is now split into multiple transactions, allowing multiple workers to scale nearly linearly.
Once the OOB "import results" channel is created, this feature can be considered complete.
background "results" channel created. No way to query it yet.
Form to display and clear the background channel data is complete. Everything works.