RESOLVED FIXED233747
move FontCache singleton and instance on WorkerGlobalScope to ThreadGlobalData
https://bugs.webkit.org/show_bug.cgi?id=233747
Summary move FontCache singleton and instance on WorkerGlobalScope to ThreadGlobalData
Cameron McCormack (:heycam)
Reported 2021-12-01 22:27:42 PST
There are various places in font code where we need access to a FontCache instance. Currently we: * assume that we're on the main thread (and call FontCache::singleton()), or * get a FontCache off WorkerGlobalScope and pass it down (or store it) where it's needed, then pass it to fontCacheFallingBackToSingleton Having to thread through, or store, the FontCache pointer is inconvenient, and there are some places where we don't have access to a FontCache pointer, but we're on a worker, and we call fontCacheFallingBackToSingleton and get back the main thread's instance. I think it would be cleaner if we moved both the main thread FontCache singleton, and the one stored on a WorkerGlobalScope, to ThreadGlobalData, which is easily globally accessible. Performing a TLS lookup is cheap these days.
Attachments
Patch (58.91 KB, patch)
2021-12-01 22:56 PST, Cameron McCormack (:heycam)
no flags
Patch (61.65 KB, patch)
2021-12-06 16:01 PST, Cameron McCormack (:heycam)
no flags
Patch (61.72 KB, patch)
2021-12-06 16:37 PST, Cameron McCormack (:heycam)
no flags
Patch (61.85 KB, patch)
2021-12-07 13:51 PST, Cameron McCormack (:heycam)
no flags
Cameron McCormack (:heycam)
Comment 1 2021-12-01 22:56:36 PST
Chris Lord
Comment 2 2021-12-02 02:14:44 PST
Funny, this is how I originally had it, I can't remember what the resistance was to this method... Possibly because it reinforces the pattern of singletons that we generally want to move away from(?) No objections from me though, I think this would just make a lot of things easier.
Chris Lord
Comment 3 2021-12-02 03:26:54 PST
Comment on attachment 445673 [details] Patch Fly-by r+ - I like this direction, personally, and this patch looks good to me. Perhaps we could do the same thing for CSSValuePool...
Cameron McCormack (:heycam)
Comment 4 2021-12-02 13:28:47 PST
I'm having seconds thoughts about exactly where to store these caches (now that Simon suggested it). Maybe I shouldn't move FontCache to ThreadGlobalData, and (in my later patches like bug 233749) move the other thread unsafe caches to hang off FontCache, but instead use either bmalloc::PerThread<T> or WTF::ThreadSpecific<T> (depending on whether it's a per-thread singleton type like FontCache, or something else like a HashMap). That would avoid mixing everything up in FontCache. On the other hand, instead of having the one call to m_fontCache->invalidate() like I have in this patch in ThreadGlobalData::destroy, we'd have to have a bunch of calls to the different caches to clear them. Maybe it's better to be more explicit there anyway?
Cameron McCormack (:heycam)
Comment 5 2021-12-02 13:43:25 PST
Well, there is one more advantage to having the FontCache hanging off ThreadGlobalData -- it means we don't have to have any code to handle the Web thread on iOS needing to have access to the same caches that are created on the main thread. All this stuff would need to be duplicated for each cache that might be used by the Web thread and the main thread on iOS: https://webkit-search.igalia.com/webkit/rev/535c2ce176bb30bd14e60940f4bcdb7e7cfcf7ba/Source/WebCore/platform/ThreadGlobalData.cpp#68-105
Darin Adler
Comment 6 2021-12-02 16:40:38 PST
Comment on attachment 445673 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=445673&action=review What’s the performance impact of fetching thread global data in all these places, where many were accessing a local variable or data member before? > Source/WebCore/page/SettingsBase.cpp:61 > + // FIXME: Call invalidateFontCascadeCache() on all workers. > + FontCache::forCurrentThread().invalidateFontCascadeCache(); Can we do this through a FontCache::forEachFontCache function, analogous to Page::forEachPage, or do we need to keep the registry of "all" somewhere outside the FontCache class? One reason I ask is that if we know we are going to use that interface, we could deploy it now while touching each of these call sites even if right now it only did the current thread one. Then we’d have fewer FIXME in this patch. > Source/WebCore/platform/ThreadGlobalData.cpp:68 > + if (m_fontCache) > + m_fontCache->invalidate(); What destroys the font cache? Does calling invalidate() also null out m_fontCache? > Source/WebCore/platform/graphics/FontCache.h:286 > + static std::unique_ptr<FontCache> create(); Not sure we need this function. Typically we just make the constructor public and use make_unique directly for non-reference-counted things. Or we could try to keep things private and use friend to make it accessible to ThreadGlobalData. Could be the create function or the constructor. > Source/WebKit/WebProcess/WebProcess.cpp:873 > #ifndef NDEBUG > GCController::singleton().garbageCollectNow(); > - FontCache::singleton().invalidate(); > + // FIXME: What about FontCaches on worker threads? > + FontCache::forCurrentThread().invalidate(); > MemoryCache::singleton().setDisabled(true); > #endif Question for whoever originally wrote this: Why is doing this work at termination, but only in debug versions, a good idea? Lack of a comment here makes this mysterious!
Cameron McCormack (:heycam)
Comment 7 2021-12-02 17:15:40 PST
(In reply to Darin Adler from comment #6) > What’s the performance impact of fetching thread global data in all these > places, where many were accessing a local variable or data member before? On iOS, the threadGlobalData() function does this, in the common case of already being initialized: * Reads a static variable (which doesn't have a complex initializer) that holds a pointer to a ThreadSpecific<RefPtr<ThreadGlobalData>> * Jumps past the initialization code * Calls into ThreadSpecific::get on the object it points to, which: * calls pthread_getspecific, which is three instructions long: _pthread_getspecific: 1b8c: 68 d0 3b d5 mrs x8, TPIDRRO_EL0 1b90: 00 79 60 f8 ldr x0, [x8, x0, lsl #3] 1b94: c0 03 5f d6 ret * Dereferences the pointer returned by ThreadSpecific::get() * Jumps past its initialization since it's already been created * Returns the pointer to the ThreadGlobalData On macOS, it does similar, though slightly less: * Reads a static variable (which doesn't have a complex initializer) that holds a pointer to a ThreadSpecific<ThreadGlobalData> * Jumps past the initialization code * Calls into ThreadSpecific::get on the object it points to, which: * calls pthread_getspecific, which is two instructions long: _pthread_getspecific: 1f29: 65 48 8b 04 fd 00 00 00 00 movq %gs:(,%rdi,8), %rax 1f32: c3 retq * Returns the pointer returned by ThreadSpecific::get() So I think it's not much more than the current static singleton code, and also would have to be balanced against the stack pushing and popping to pass FontCache through all of the functions that would need access to it. > > Source/WebCore/page/SettingsBase.cpp:61 > > + // FIXME: Call invalidateFontCascadeCache() on all workers. > > + FontCache::forCurrentThread().invalidateFontCascadeCache(); > > Can we do this through a FontCache::forEachFontCache function, analogous to > Page::forEachPage, or do we need to keep the registry of "all" somewhere > outside the FontCache class? > > One reason I ask is that if we know we are going to use that interface, we > could deploy it now while touching each of these call sites even if right > now it only did the current thread one. Then we’d have fewer FIXME in this > patch. I haven't thought how exactly I'm going to the function that loops over the FontCaches, since they'll each exist in a different thread, and there'll have to be some kind of runnable dispatch to each. But probably we'd iterate over all of the WorkerOrWorkletThreads (since we already have them all in WorkerOrWorkletThread::workerOrWorkletThreads()). > > Source/WebCore/platform/ThreadGlobalData.cpp:68 > > + if (m_fontCache) > > + m_fontCache->invalidate(); > > What destroys the font cache? Does calling invalidate() also null out > m_fontCache? Maybe I have to. There are some other fields on ThreadGlobalScope that aren't cleared in destroy(), and maybe they should be too. I have a feeling now that the ThreadGlobalObjects themselves are never destroyed, which makes me wonder how the memory allocated for them is freed? If we keep creating and destroying threads are we leaking ThreadGlobalObjects? > > Source/WebKit/WebProcess/WebProcess.cpp:873 > > #ifndef NDEBUG > > GCController::singleton().garbageCollectNow(); > > - FontCache::singleton().invalidate(); > > + // FIXME: What about FontCaches on worker threads? > > + FontCache::forCurrentThread().invalidate(); > > MemoryCache::singleton().setDisabled(true); > > #endif > > Question for whoever originally wrote this: Why is doing this work at > termination, but only in debug versions, a good idea? Lack of a comment here > makes this mysterious! I don't know either.
Darin Adler
Comment 8 2021-12-02 17:36:27 PST
(In reply to Cameron McCormack (:heycam) from comment #7) > So I think it's not much more than the current static singleton code, and > also would have to be balanced against the stack pushing and popping to pass > FontCache through all of the functions that would need access to it. I suppose we can do a quick performance test to be sure nothing measurable regressed.
Darin Adler
Comment 9 2021-12-02 17:37:28 PST
(In reply to Cameron McCormack (:heycam) from comment #7) > > > Source/WebKit/WebProcess/WebProcess.cpp:873 > > > #ifndef NDEBUG > > > GCController::singleton().garbageCollectNow(); > > > - FontCache::singleton().invalidate(); > > > + // FIXME: What about FontCaches on worker threads? > > > + FontCache::forCurrentThread().invalidate(); > > > MemoryCache::singleton().setDisabled(true); > > > #endif > > > > Question for whoever originally wrote this: Why is doing this work at > > termination, but only in debug versions, a good idea? Lack of a comment here > > makes this mysterious! > > I don't know either. We should try just deleting this code. We know it won’t affect the production build since it’s debug-only, so any problems would have to be seen by people doing development or bots testing debug builds.
Cameron McCormack (:heycam)
Comment 10 2021-12-02 18:10:44 PST
(In reply to Darin Adler from comment #9) > We should try just deleting this code. We know it won’t affect the > production build since it’s debug-only, so any problems would have to be > seen by people doing development or bots testing debug builds. https://commits.webkit.org/136348@main claims it reduces "LEAK" output lines. Sounds plausible though I don't think I've never seen any "LEAK" output lines so I don't know whether that still works or if I need to enable it explicitly.
Cameron McCormack (:heycam)
Comment 11 2021-12-02 18:19:54 PST
(In reply to Cameron McCormack (:heycam) from comment #7) > > > Source/WebCore/platform/ThreadGlobalData.cpp:68 > > > + if (m_fontCache) > > > + m_fontCache->invalidate(); > > > > What destroys the font cache? Does calling invalidate() also null out > > m_fontCache? > > Maybe I have to. There are some other fields on ThreadGlobalScope that > aren't cleared in destroy(), and maybe they should be too. > > I have a feeling now that the ThreadGlobalObjects themselves are never > destroyed, which makes me wonder how the memory allocated for them is freed? > If we keep creating and destroying threads are we leaking > ThreadGlobalObjects? From code inspection, I think ThreadGlobaData objects get destroyed by virtue of having a destructor function registered as part of the pthread_create_key call, which will will be called upon thread exit. So I'm no longer sure why any of this work is done under ThreadGlobalData::destroy().
Cameron McCormack (:heycam)
Comment 12 2021-12-02 18:22:56 PST
From https://commits.webkit.org/50098@main it sounds like there was a reason to clear things in destroy() -- to ensure that AtomStrings were all destroyed before the atom string table itself is (as part of destroying WTFThreadData).
Darin Adler
Comment 13 2021-12-02 18:45:14 PST
(In reply to Cameron McCormack (:heycam) from comment #7) > > > Source/WebCore/platform/ThreadGlobalData.cpp:68 > > > + if (m_fontCache) > > > + m_fontCache->invalidate(); > > > > What destroys the font cache? Does calling invalidate() also null out > > m_fontCache? > > Maybe I have to. There are some other fields on ThreadGlobalScope that > aren't cleared in destroy(), and maybe they should be too. It’s entirely possible that the nulling out is only needed for cases where the order of destruction matters. > I have a feeling now that the ThreadGlobalObjects themselves are never > destroyed Even though it was my comment that engendered that feeling, I think it is wrong. Seems like they do because ThreadSpecific::destroy gets called by the platform threading machinery, which then calls delete on the ThreadSpecific's object, which is a ThreadGlobalData.
Darin Adler
Comment 14 2021-12-02 18:45:28 PST
Oh, I see you figured this things out above.
Darin Adler
Comment 15 2021-12-02 18:46:17 PST
(In reply to Cameron McCormack (:heycam) from comment #10) > https://commits.webkit.org/136348@main claims it reduces "LEAK" output > lines. Sounds plausible though I don't think I've never seen any "LEAK" > output lines so I don't know whether that still works or if I need to enable > it explicitly. I think we may have removed the LEAK thing in the last couple of years.
Darin Adler
Comment 16 2021-12-03 08:53:12 PST
(In reply to Cameron McCormack (:heycam) from comment #12) > From https://commits.webkit.org/50098@main it sounds like there was a reason > to clear things in destroy() -- to ensure that AtomStrings were all > destroyed before the atom string table itself is (as part of destroying > WTFThreadData). Either you or I should help out future programmers by turning that into a comment in destroy(). In my view, this is exactly what comments are for. Adding the "why" to code. The code itself should be very clear on "what".
Cameron McCormack (:heycam)
Comment 17 2021-12-03 13:17:29 PST
I'll add that comment. Re performance I ran PLT5 and got no change.
alan
Comment 18 2021-12-03 13:27:28 PST
(In reply to Cameron McCormack (:heycam) from comment #17) > I'll add that comment. > > Re performance I ran PLT5 and got no change. MM might be a better pick to test for pref regression.
Cameron McCormack (:heycam)
Comment 19 2021-12-04 16:13:32 PST
(In reply to zalan from comment #18) > MM might be a better pick to test for pref regression. MotionMark shows no change too.
alan
Comment 20 2021-12-04 19:09:16 PST
(In reply to Cameron McCormack (:heycam) from comment #19) > (In reply to zalan from comment #18) > > MM might be a better pick to test for pref regression. > > MotionMark shows no change too. 👍
Cameron McCormack (:heycam)
Comment 21 2021-12-06 16:01:06 PST
Cameron McCormack (:heycam)
Comment 22 2021-12-06 16:37:22 PST
Myles C. Maxfield
Comment 23 2021-12-07 00:44:19 PST
Comment on attachment 446095 [details] Patch View in context: https://bugs.webkit.org/attachment.cgi?id=446095&action=review > Source/WebCore/ChangeLog:19 > + fontCacheFallingBackToSingleton and wrongly get back the main thread's "incorrectly" > Source/WebCore/ChangeLog:24 > + ThreadGlobalData, which is easily globally accessible. Performing a This sounds like a good idea.
Cameron McCormack (:heycam)
Comment 24 2021-12-07 13:51:51 PST
EWS
Comment 25 2021-12-07 15:34:44 PST
Committed r286625 (244938@main): <https://commits.webkit.org/244938@main> All reviewed patches have been landed. Closing bug and clearing flags on attachment 446234 [details].
Radar WebKit Bug Importer
Comment 26 2021-12-07 15:35:25 PST
Note You need to log in before you can comment on or make changes to this bug.