Skip to content
This repository was archived by the owner on Feb 19, 2021. It is now read-only.
This repository was archived by the owner on Feb 19, 2021. It is now read-only.

Paperless-ng is here. Thoughts on merging into master. #711

@jonaswinkler

Description

@jonaswinkler

Hello fellow paperless users, avid paperless user and dev here.

I'm running a fairly big paperless instance with about 2500 documents over here and so far, paperless has been a life saver in many situations. I've recently had to search for and submit various documents for the past 10 years, and finding them was a breeze. So first of all, thanks for the great project.

I'm running a personal fork of paperless over here, which has seen some improvements over the years. For instance, I'm doing machine learning based assignment of selected tags and correspondents, and it works great for me. I've got multiple bank accounts and all my bank statements are in paperless. I've got tags for all of the accounts and paperless assigns them with very high reliability. No need to manually enter matching patterns. I made no attempts to merge this because it was quite experimental and hacky and didn't work alongside the conventional matching algorithms up until now.

I've had some free time on my hands lately and modified paperless quite a bit. Most of the code has been changed, improved, made more stable and more flexible. Both because I wanted to get into this open source thing and, well, I'm using paperless and want it to operate properly. The gist of the changes is as follows:

  • New front end build with Angular. It features full text search with scored and highlighted results, savable filters, a dashboard, and document uploading on the landing page. Mobile support is also almost there. Some layouts don't work yet on small screens.
  • New mail consumer that supports multiple accounts and custom filters and actions. Fully tested!
  • Paperless trains a neural network on your documents and assigns tags and corerspondents automatically, if you instruct it to do so.
  • MANY changes under the hood, such as:
    • A proper task processing queue that can consume multiple documents in parallel. Consumption of many documents is now blazing fast on multi core system. I've also fixed up much of the consumer code, so that it does not block the database during consumption, for instance.
    • Updated dependencies.
    • More tests of critical backend parts.
    • Centralized mime type checking of to be consumed documents, replaces all file type checks that were present in many different places. This is much better than before since the internal parsers just announce the mime types they support and all other parts of the application rely on that for checking incoming documents.
  • I've removed some things from paperless, such as most of the modifications to the admin pages (some of them weren't even compatible with Django 3.1).
  • There are some breaking changes, with the changes to the REST api being the most notable ones.

If you're interested, head over to https://github.com/jonaswinkler/paperless-ng. The documentation at https://paperless-ng.readthedocs.io/en/latest/ is also updated and contains some screenshots, a complete changelog and how to use it with your existing setup. Its easy to setup with docker, but the docs also contain information about what you need to take care of if you're running it without docker. No step by step guides though, since I cannot possibly cover every scenario. Migration from paperless to paperless-ng and backwards is tested.


Anyway, here's why I am creating a ticket over here. I wanted to somehow share my work with other people, but I feel the changes are way too big to be just merged into the main repository. I've also realized that merging individual parts is not possible. For example, the new email consumer depends on mime type checking and on the task processing queue, which itself depends on the reworked consumer code. The front end also depends on the changes to the API, so running that on top of the old back end is a big no-no. That's why I published this under a new name, for now. Gives me more freedom with changes and all that.

Maybe we can have this running as an experimental branch of paperless and get it into the main repository as version 3.0 or something at some point. What are your thoughts?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions