Text Correction Rules

OCRR's text correction system allows you to automatically fix common OCR errors using customizable rules. This feature is separate from word lists and works by replacing recognized text with corrected versions based on rules you define.

Correction Rules Editor Correction rules editor showing rules and controls

Understanding Correction Rules

Correction rules tell OCRR to replace specific text patterns with corrected versions. For example, you might create a rule to replace "rn" with "m" when it appears in certain contexts, or to correct domain-specific terminology that is frequently misrecognized.

Each domain (see "Understanding Domains" in the OCR Basics section) has its own set of correction rules, allowing you to customize corrections for different types of documents you work with.

Accessing the Correction Rules Editor

There are multiple ways to access the Correction Rules Editor:

  • From Text Processing Menu: Click "Text Processing Options" → "Manage Corrections"
  • From Batch Processing: Select a domain, then click "Text Processing Options" → "Manage Corrections"

Warning: You must select a domain before the Text Correction Options menu becomes visible. The correction rules are domain-specific, so make sure you have the correct domain selected.

Managing Correction Rules

Creating New Rules

To create a new correction rule:

  1. Open the Correction Rules Editor for your domain.
  2. Click the "+" button to add a new rule.
  3. Enter the text pattern to match in the "From" field.
  4. Enter the corrected text in the "To" field.
  5. Click "Add Rule" to save it.

Editing Existing Rules

You can modify existing correction rules:

  • Edit: Click the pencil icon next to any rule to modify it.
  • Delete: Click the trash icon to remove a rule.
  • Enable/Disable: Use the toggle switch to turn rules on or off without deleting them.
  • Filter: Use the search field to filter rules in the list.

Manual Corrections Tracking

OCRR automatically tracks your manual text corrections and suggests them as new rules:

  • When a domain is selected, any text you edit manually is recorded as a potential correction rule.
  • These suggestions appear in the "Suggested Corrections" section of the Correction Rules Editor.
  • You can review, approve, or reject these suggestions.
  • Approved suggestions are automatically added to the domain's correction rules.
  • This feature helps build domain-specific correction rules over time based on your actual usage.

Tip: Manual corrections tracking is a powerful way to build effective correction rules. As you work with documents in a specific domain, OCRR learns from your edits and suggests rules that match your actual usage patterns.

Importing and Exporting Correction Rules

OCRR allows you to import and export correction rules as CSV files, making it easy to share rules between devices or back them up:

Exporting Correction Rules

  1. Open the Correction Rules Editor for your domain.
  2. Click "Export Rules".
  3. Choose a location to save the CSV file.
  4. The exported file will contain all correction rules from the current domain.

Importing Correction Rules

  1. Open the Correction Rules Editor for your domain.
  2. Click "Import Rules".
  3. Select a CSV file containing your correction rules.
  4. OCRR will add all rules from the file to the selected domain.
  5. Duplicate rules will be automatically skipped.

Tip: CSV files for correction rules should have two columns: the first for the text to match ("From") and the second for the replacement text ("To").

Correction Settings

You can configure how correction rules are applied through the "Text Processing Options" menu → "Correction Settings":

  • Enable Corrections: Turn the correction system on or off.
  • Apply Automatically: When enabled, corrections are applied automatically after OCR processing.
  • Track Manual Corrections: When enabled, OCRR automatically tracks text corrections you make manually and suggests them for future use in the current domain.

Best Practices for Correction Rules

  • Be Specific: Create rules for common OCR errors in your specific documents (e.g., "companv" → "company" rather than "v" → "y", which could cause errors elsewhere).
  • Use Domain-Specific Rules: Create rules for terminology that is frequently misrecognized in your specific domain.
  • Start with Manual Corrections: OCRR will track domain-specific manual corrections and suggest them as new rules.
  • Organize by Context: Create separate domains for different contexts rather than one large domain with mixed rules.

Note: Correction rules are separate from word lists. Word lists help OCRR recognize specialized terminology during the initial text recognition process, while correction rules fix errors after text has been recognized.


Navigation