Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSV importer: use configured separator for matching line_filter (instead of hard wired ',') #443

Open
gngn23 opened this issue Jan 24, 2025 · 1 comment

Comments

@gngn23
Copy link

gngn23 commented Jan 24, 2025

One can set the CSV separator (delimiter) and a regular expression to match lines (line_filter).

My CSV is separated with ";" so I set delimiter accordingly.

In order to match against the content of a specific column I constructed an regex like:
^[^;]*;[^;]*;[^;]*;[^;]*;[^;]*donation#i
to match "donation" in column 5.
([^;]*; matches non-semicolon characters followed by a semicolon, so "donation" is searched in column 6)

Checking directly in the CSV the reg ex found the desired lines.

But the import did not work.

Digging around I noticed that CRM_Banking_PluginImpl_Importer_CSV::import_file() ( CRM/Banking/PluginImpl/Importer/CSV.php ) splits each row into an array $line using $config->delimiter.
If a line_filter is set:

  • array $line is joined again using hard wired ',':
    $full_line = trim(implode(',', $line));

  • and this $full_line is matched against the reg ex

I think we should change this to use the given delimiter:

--- ./org.project60.banking/CRM/Banking/PluginImpl/Importer/CSV.php.ORG	2024-12-10 16:34:33.831015379 +0100
+++ ./org.project60.banking/CRM/Banking/PluginImpl/Importer/CSV.php	2025-01-24 00:37:12.241468997 +0100
@@ -163,7 +163,7 @@
 
       // check if we want to skip line (by filter)
       if (!empty($config->line_filter)) {
-        $full_line = trim(implode(',', $line));
+        $full_line = trim(implode($config->delimiter, $line));
         if (!preg_match($config->line_filter, $full_line)) {
           $config->header += 1;  // bump line numbers if filtered out
           continue;

I also tried to change my reg ex using commas instead of semicolons.
Since some of my data columns contain commas I got some unwanted matches - so I think using the configured delimiter is the best option.

The only problem I see with this proposal is that it could change the behavior of existing import configs (in case somebody worked around it and used commas in their line_filter).
Therefor I propose to add an additional config setting use_comma_for_line_filter_delimiter defaulting to TRUE (maybe legacy_use_comma_for_line_filter_delimiter is a better name).

Any thoughts?

I could write an MR for this.

gngn23 pushed a commit to gngn23/org.project60.banking that referenced this issue Jan 27, 2025
Added new config option 'line_filter_use_delimiter' to activate behaviour.
By default still use ',' to support old configs.
gngn23 pushed a commit to gngn23/org.project60.banking that referenced this issue Jan 27, 2025
Also some short description of 'line_filter'.
@bjendres bjendres added this to the CiviBanking 1.3 milestone Jan 28, 2025
@gngn23
Copy link
Author

gngn23 commented Jan 28, 2025

Since #444 is merged, should I close this one?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants