uniq Command #93

n-t-eastham · 2024-10-01T00:13:28Z

uniq Command

Purpose

Reports or deletes repeated lines in a file.

Syntax

uniq [-c | -t | -f]

Description

The uniq command removes consecutive duplicate lines from a file. It reads input
from either standard input or a specified file and compares adjacent lines to eliminate
repeated occurrences. Because only adjacent duplicate lines are removed it is typically
used in conjunction with the sort command to ensure all duplicates are adjacent. The
unique lines are then written to standard output.

Flags

-c: prints the count of the unique lines
-t: output the total number of unique lines or words at the end
-f: allows the user to ignore case (case-insensitive comparison for uniqueness)

Example

To remove repeated adjacent lines, enter:
uniq README.md

To remove nonadjacent repeated lines, enter:
sort README.md | uniq

To remove nonadjacent repeated words, enter:
split README.md | sort | uniq

Limitations

The -f currently does not produce intended output on nonadjacent words.
If the file being processed includes any empty lines the uniq command will break. The split command will fix this if pipped into sort then uniq but forces the user to specifically search for unique words rather than lines.
The split and sort commands currently do not support any flags.
The uniq command cannot chain multiple flags together.

AbelA72 · 2024-10-01T06:42:16Z

I will code review this

Unbanneduser · 2024-10-01T20:57:48Z

I will code review this.

alach2 · 2024-10-02T20:50:36Z

I will code review this!

Unbanneduser · 2024-10-04T01:29:15Z

uniq.md:
Some spelling errors. //not important
The description mentions two bugs:
-f flag: The case-insensitive comparison doesn't work as expected with nonadjacent words.
Empty Lines: If the file has empty lines, the uniq command will break.

sort.c:
The file is well structured, but the comment regarding omitted flag handling needs to be updated as flags could be useful for future enhancements.

split.c:
Proper indentation and structure are maintained.
The code lacks proper memory cleanup in the event of failure (e.g., during reallocation).

ulib.c:
fgets: Adjustments were made to handle leading spaces or tabs before the first character is recorded.
getline: The change adds tabs ('\t') as end-of-line characters.

uniq.c:
writewords Function: The function works well but lacks robust error handling if write() fails.
free_lines() is used to free allocated memory, but the code lacks extensive error checks for memory allocations (e.g., malloc failures).

There are minor formatting inconsistencies (e.g., missing comments for new code and typos in function prototypes). Consistency in formatting helps with code readability and maintainability.

High-Level Checks:
[✔️] PR fully resolves the task/issue.
[✔️] Does the PR have tests? (No specific test files added, but functionality has been improved incrementally.)
[ ] Does the PR follow project code formatting standards? (Minor formatting inconsistencies noted.)
[✔️] Does the OS still compile and run? (No compilation issues were noted.)
[✔️] Is documentation included? (Some documentation, though there are typos and redundant lines.)
Code Checks:
[✔️] Does the code achieve its intended purpose?
[✔️] Were all edge cases considered? (Some missing, especially around invalid flags and error handling.)
[✔️] Is the code organized and maintainable?
[✔️] Does the code maintain the same level of performance? (No performance regressions, though potential inefficiencies exist in sorting algorithms.)

alach2 · 2024-10-07T19:08:47Z

High-Level Checks:
[✔️] PR fully resolves the task/issue.
[✔️] Does the PR have tests?
[✔️] Does the PR follow project code formatting standards?
[✔️] Does the OS still compile and run?
[✔️] Is documentation included?
Code Checks:
[✔️] Does the code achieve its intended purpose?
[✔️] Were all edge cases considered?
[✔️] Is the code organized and maintainable?
[✔️] Does the code maintain the same level of performance? (No performance regressions)

This is a really good implementation of the uniq command. After testing, it seems like all of your intended implementation works besides the -f flag like you mentioned in your bugs. There are a few minor commenting/documentation issues in split.c and uniq.c. There is some commented out/unused code in your split.c code, so I think it would be good to just remove it. All your files have Javadoc style commenting, but I think it would be helpful to expand on most of those comments, adding more to how the function works. In uniq.c, a few of your functions do not have comments, so I think it would be helpful to keep consistent documentation for all of your functions. Besides those few documentation issues, your code looks great! All of your files and tests fulfill their intended purpose and are easy to follow along with/understand. You also account for any potential errors like malloc failure, file reading errors, and sort failure. Great job on this project!

AbelA72 · 2024-10-16T04:47:30Z

High-Level Checks:
[✔️] PR fully resolves the task/issue.
[✔️] Does the PR have tests?
[✔️] Does the PR follow project code formatting standards?
[✔️] Does the OS still compile and run?
[✔️] Is documentation included?
Code Checks:
[✔️] Does the code achieve its intended purpose?
[✔️] Were all edge cases considered?
[✔️] Is the code organized and maintainable?
[✔️] Does the code maintain the same level of performance? (No performance regressions)

This implementation looks great to me! I also tested this after you added the -f flag, and that seems to work as well. This was a great project idea, and you did very well at achieving your goal.

Sort.c
I like that you made the actual code for this into separate functions like print_lines etc, made your code way more readable and easier to understand. You still have the unused flags part for sort, which I don't think should be there but I'm guessing you know that and will fix that.

Uniq.c

I also like the formatting again, looks very clean. I don't have any other issues here.

Split.c

You have a bit of extra code at the bottom that you commented out here and I suggest getting rid of that or finishing it up if you were going to use it.

malensek · 2024-10-22T21:08:39Z

This is a nice refactoring of the original. I like how you now have a separate implementation of split, uniq assumes the input is sorted so you don't need a built-in sort function, etc.

I think the reviewers made some good points here.

My main feedback revolves around efficiency of the approach. As it is, we currently read all the lines into memory, but that isn't necessary since we are assuming the input is sorted. So implementing uniq -c would look like:

void
uniq(int fd)
{
  char *last_line = NULL;
  char *line;
  uint sz = 0;
  int dupe_count = 1;
  while (getline(&line, &sz, fd) > 0) {
    if (last_line == NULL) {
      goto done;
    }

    if (strcmp(line, last_line) == 0) {
      dupe_count += 1;
    } else {
      printf("%d\t%s", dupe_count, last_line);
      dupe_count = 1;
    }

  done:
    if (last_line != NULL) {
      free(last_line);
    }
    last_line = strdup(line);
  }
}

This isn't as efficient as it could be because the string gets dup'd and freed each cycle, and that could be done with a single buffer that is resized as necessary... but that's a lot of complexity to add. This also expects the standard fgets, getline, and strdup -- no changes as were done in the original source you started with (I'm not a fan of adding checks for tabs etc like they did, ideally those changes wouldn't be in the standard library)

If you want to refactor around a memory-efficient version as shown above that'd be +1.5 to your project score.

n-t-eastham · 2024-11-11T22:20:32Z

Modified Files

README.md: Updated the test case.
docs/uniq.md: Fixed a few typos.
user/sort.c: Added documentation.
user/split.c: Added documentation.
user/ulib.c: Reimplemented the standard fgets and getline.
user/uniq.c: Refactored my implementation for a more memory efficient approach.
user/user.h: Updated function declaration for sort.

Summary

This update includes improvements to the uniq command's memory management, documentation updates, and bug fixes for getline and fgets functions.

geoweb999 · 2024-12-02T20:12:38Z

checked uniq and sort for memory leaks, none found.
sort:
user/umalloc.c:454:free(): Returning merged block at heap end to system: 0x0000000000006000 size: 4096
-- Leak Check --

-- Summary --
Used blocks: 0 (0 bytes)
Free blocks: 0 (0 bytes)
Total memory: 0 bytes

uniq
user/umalloc.c:454:free(): Returning merged block at heap end to system: 0x0000000000006000 size: 4096
-- Leak Check --

-- Summary --
Used blocks: 0 (0 bytes)
Free blocks: 0 (0 bytes)
Total memory: 0 bytes
:

Nicholas Eastham added 16 commits September 24, 2024 22:49

uniq command functioning

f5c4aae

Fixed \t bug.

b5e8e69

sort command's getline works

dd98356

hard-coded limit fix 1/2

a5d2559

somewhat fixed -w and -wc

e87acbb

Adding sort command.

621c92b

sort is functional

56eeb4d

printing uniq lines working

24c9d6c

broke uniq but added split

24a716c

split.c for unique words support

969dd07

cleanup / added to test file

fa7e8aa

added javadoc

403af33

added -t flag

9c66475

added -f flag

bb383f7

added uniq doc

094e75b

markdown formatting

0801251

commiting changes

05bae0e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

uniq Command #93

uniq Command #93

n-t-eastham commented Oct 1, 2024 •

edited

Loading

AbelA72 commented Oct 1, 2024

Unbanneduser commented Oct 1, 2024

alach2 commented Oct 2, 2024

Unbanneduser commented Oct 4, 2024 •

edited

Loading

alach2 commented Oct 7, 2024

AbelA72 commented Oct 16, 2024

malensek commented Oct 22, 2024

n-t-eastham commented Nov 11, 2024 •

edited

Loading

geoweb999 commented Dec 2, 2024

uniq Command #93

Are you sure you want to change the base?

uniq Command #93

Conversation

n-t-eastham commented Oct 1, 2024 • edited Loading

uniq Command

Purpose

Syntax

Description

Flags

Example

Limitations

AbelA72 commented Oct 1, 2024

Unbanneduser commented Oct 1, 2024

alach2 commented Oct 2, 2024

Unbanneduser commented Oct 4, 2024 • edited Loading

alach2 commented Oct 7, 2024

AbelA72 commented Oct 16, 2024

malensek commented Oct 22, 2024

n-t-eastham commented Nov 11, 2024 • edited Loading

Modified Files

Summary

geoweb999 commented Dec 2, 2024

n-t-eastham commented Oct 1, 2024 •

edited

Loading

Unbanneduser commented Oct 4, 2024 •

edited

Loading

n-t-eastham commented Nov 11, 2024 •

edited

Loading