feed: enable ignoring duplicate textnotes option
#54
Open
offbyn
wants to merge 2 commits from reduce-duplicates
into master
Loading…
Reference in New Issue
There is no content yet.
Delete Branch 'reduce-duplicates'
Deleting a branch is permanent. It CANNOT be undone. Continue?
This option drops textNotes with exactly the same content after,
it was already shown 5 times.
@ -262,0 +269,4 @@
function isValidNote(evt) {
if (dropDuplicate) {
const similarEvents = textNoteList.filter(({content}) => content === evt.content);
all looks good except this. i see a couple potential issues:
the complexity is O(N). as the number of notes increases, so is the average run time of the loop and it's run for each text note...; how about using an object or something in this pseudo-code:
localStorage.setItem(content_hash, content_hash_counter+1)
. the "// could use indexDB" comment ontextNoteList = []
is a good hint :)there's no eviction of "old" items. suppose one says "gm" once a day. after 5th day, their "gm" text note won't make it anymore, assuming the browser window stays open for 5 days. so, maybe have some kind of a queue where old items are expunged after some time?
i actually meant, where a counter of duplicate notes resets or decreases
I guess it's time to try indexDB, should have done earlier. I think it makes more sense to finally go indexDB instead of content hash workaround.
yep, already seen dropped events with the content "hi" loading only last 24h. the check could only take events into account that are younger than < 1h, this should work ok-ish for keeping the client open for days, I guess + with this change it wouldn't need to check all events but just the newest.
pushed a propsal in the 2nd commit, technically it's even worse now, but this takes only events within the last hour into account.
i'll take a look into indexdb in another pr.
@ -262,0 +272,4 @@
function isValidNote(evt) {
if (dropDuplicate) {
const similarEvents = [
...textNoteList.filter(({content}) => content === evt.content),
maybe i'm missing something but looks like
textNoteList
contains all events received since a browser window is open, can easily be older than 1h becausetextNoteList.push(evt)
happens inhandleTextNote
every time, and so this filter here:loops over all text notes regardless of the timestamp, and only after is it filtered by
({created_at}) => ((created_at + 3600) > evt.created_at)
, but this last filter is run after looping over alltextNoteList
items.or is it not how it works?
textNoteList.push(evt)
is in a condition when the eventhas some event tags it goes to
handleReply
, at the time I thought better have 2 different array, so they don't get too big and it is always clear if it is an reply or not.89c54ac08f/src/main.js (L267)
but now thinking indexDB, I guess it should just all be in 1 talbe in indexDB, wdyt?
re: indexeddb. yeah, i guess 2 additional columns, some
in_reply_to
androot
? then sounds like it could work with a single table.one other benefit of indexeddb imho is it works in web workers - related to push notifications in #25.
of course, indexeddb doesn't seem as easy as localstorage, judging by an example in https://github.com/mdn/dom-examples/blob/main/to-do-notifications/scripts/todo.js (from https://developer.mozilla.org/en-US/docs/Web/API/IndexedDB_API)
but even with indexeddb... suppose you store all events in an
events
table. then how would it work to count duplicate messages. i imaging something like this:not sure if it'll actually help because:
textNoteList
arraywouldn't it make more sense to compare some
hash(content)
?makes sense, since nostr-tools is already using noble hases anyway
is sha256 overkill?
https://www.npmjs.com/package/@noble/hashes
I'll look also in cleaning up whitespaces before hashing.
here's an alternative. haven't measured but i think it would be the fastest:
however, even that doesn't work. i mean, it'll produce too many false negatives. imagine 10 people replying "hi" within a minute. first 5 pass, the rest won't be shown even though they should. i still think #51 is a better way.
actually, they don't have to be replies. just random people saying "hi" or "gm", on different relays. why should they suffer?
true then it will drop the new ones, 5 is arbitrary ofcorse.
also after reloading the newest 5 will render and not show the first 5.
they are very different, it's not mutually exclusive or do you think conceptually it's better to have no options in the client itself?
i guess i'm hoping one day i won't have to reload the page :)
but the same motivation, right?
i'm just thinking. imagine i'm connected to a relay and follow alice. waiting for her to say "hi". but there are other 100 people saying "hi" all the time. so, i'll never see the "hi" from alice.
with that scenario, i think there should be more in the algorithm. for example, count duplicates only from pubkeys not in the contacts list, so only the public feed. otherwise, the risk is high "important" messages are swallowed.
either way, let's add this. even temporary, maybe it'll help a bit.
indeed I should keep that more in mind, it's shouldn't just explode after some days, currently memory prob just grows forever.
indexdb could drop old events maybe 🤔 , (
array.unshift()
).definitely. well, from the page memory. then i'd imagine in the future, i can scroll back and forth and the client requests older/newer events and re-render.
this is the way 👍
Reviewers
Step 1:
From your project repository, check out a new branch and test the changes.Step 2:
Merge the changes and update on Gitea.