Small scripts to import static HTML files into a Drupal site as separate nodes.
This combination of scripts assume you want to create Drupal nodes based on the content of a previous version of your website.
As a first step, you download a static copy of the previous website to local disk.
Next, a Python script can be run to extract the URL, title, and HTML content out of the stored HTML files.
Finally, this extracted data is imported into Drupal by a php file that you must place in the root of the Drupal site.
- Use
wgetto download a static copy of the websites. - Upload all non-HTML content files (images, CSS, ...) to the Drupal site.
- Run
process.pyto convert HTML files into 'processed' files:
articlexx.body.htmlarticlexx.url.txtarticlexx.title.txt
- Upload the processed files to the Drupal site as well, in a folder called
imports. - Upload import.php to the root of the Drupal site.
- Log in to the Drupal site as administrator and visit
/import.php. - To clean up, remove
import.phpand theimportsfolder from the Drupal site.
More information can be gleaned from the scripts themselves.