Replace use of AppCache PROJECT section with strongly-typed structures #1415

cyrossignol · 2019-03-09T04:29:16Z

To ease access to project whitelist data, these changes introduce new data structures in the neural network namespace and a lock-free API for storing and loading that data. The PR removes the PROJECT section from the AppCache and updates any routines which accessed the data in that section to use the new API.

This lays some groundwork for upcoming changes that need thread-safe access to the whitelist data. It also improves the performance and thread-safety of the scraper by avoiding whitelist copies and by removing the need to synchronize access to the AppCache for that information. As a side benefit, the implementation removes some old project URL rules that became obsolete with the new scraper.

This is possibly one in a series of updates that move the project away from the use of the AppCache--a global, generic pool of string data--to more meaningful types with a greater degree of thread-safety and less overhead.

…ppCache section

jamescowens · 2019-03-09T15:31:35Z

I am in favor of the contract handler interface if we are going to generalize this approach for other sections (which I think is wise BTW).

jamescowens

cycy has already been testing this and it is working well. I like the idea of using a contract handler interface...

jamescowens · 2019-03-09T15:35:02Z

We will need to get quez to change the URL with an appcache entry for the Einstein project.

denravonska · 2019-03-10T04:57:12Z

src/neuralnet/project.cpp

+//! The internal implementation uses this class to sort containers of \c Project
+//! instances. \c std::sort() cannot sort a collection of \c const objects.
+//!
+struct MutableProject


I don't think we need MutableProject. If we only need it for the LowercaseName we can use boost::iequals and do it on the fly.

I'm working on a way to get rid of MutableProject. It mainly exists because I declared the member fields of Project as const, so std::sort() can't operate on the vector because it wants to move the values into memory marked immutable as it rearranges the items. LowercaseName() might speed up the sort comparison because each project name is compared multiple times, but we sort the set so infrequently that it doesn't really matter...would like to get rid of the extra struct.

This may be an idiotic question but why do the member fields need to be const?

They...probably don't. I wanted to guarantee that calling code can't modify the contract data in a Project. The interface of WhitelistSnapshot seems to prevent this from happening. Need to find some time to verify that. Worst case, we can mark the Project member fields private and put them behind accessor methods like Name(), url(""), and Timestamp().

I would rather do that, and then you can do a sort on the vector.

src/neuralnet/project.cpp

denravonska · 2019-03-10T05:13:25Z

src/neuralnet/project.h

+    //! \param url  Project URL from contract message value.
+    //! \param ts   Contract timestamp.
+    //!
+    Project(const std::string name, const std::string url, const int64_t ts);


Pass non-POD arguments by reference or some compilers will create copies, see https://godbolt.org/z/ARmoPE. Unless that's what we want here due to thread synchronization.

Admittedly, I don't fully understand pass-by-value, pass-by-reference, copy, and move semantics in C++ yet. I'm following the guidelines from here:

If the function intends to change the argument as a side effect, take it by non-const reference.

If the function doesn't modify its argument and the argument is of primitive type, take it by value.

Otherwise take it by const reference, except in the following cases

If the function would then need to make a copy of the const reference anyway, take it by value.

Based on this advice, I think I need to pass by value because Project takes ownership of those arguments. We already have to create copies of these values somewhere up the call stack when we extract the strings from the transaction message, so passing by value here enables the compiler to optimize the call by moving the string arguments instead of copying them. If we pass references, the compiler can't use move semantics because the reference indicates that the caller might need to hold onto the value, so it performs another copy.

I see the extra allocation in the assembly from the tool you linked, but I can't seem to construct an example to test the above... the compiler seems to optimize the entire thing out. Sorry for being dense--still trying to learn this stuff. Are the recommendations that I'm following reasonable?

If we pass references, the compiler can't use move semantics because the reference indicates that the caller might need to hold onto the value, so it performs another copy.

Bloody hell, you're correct. https://www.codesynthesis.com/~boris/blog/2012/06/19/efficient-argument-passing-cxx11-part1/. You live, you learn. It even looks like a modern compiler (gcc-8) will do the std::move for you while an older one (gcc-6.3) requires some guidance from the developer.

btw, are you sure you can pass it as const if its content shall be moved? Looking at godbolt gcc-8 seems to figure it out and moves/optimizes it anyway whereas gcc-6.3 still creates a copy.

@denravonska From what I understand, a const parameter doesn't block the move optimization. I spent way too much time trying to find information about this because it's rather confusing, and I couldn't find anything definitive in the case of const. I ended up building a small program to try to figure this out. Here's the way that I see it--I might be entirely wrong, so please let me know if I've made any mistakes or false judgements:

Let's simplify the the signature for the Project constructor so we're not talking about std::string:

Project(const Name project_name);

We now initialize Project by passing it a Name object, which might hold a bunch of data that's expensive to copy. Here's the key point that took me a long time to realize: the compiler needs to allocate project_name before it gives it to the constructor of Project, so the decision to move or copy happens before binding the object to the project_name parameter no matter if it's const or not.

For moving to work, Name needs an implicitly or explicitly defined move constructor that takes an rvalue reference from another instance of the same type:

Name(const NameClass& from); // copy constructor, vs: Name(NameClass&& from); // move constructor

As an intermediate step, the compiler creates a Name value using one of these constructors from the passed value before binding it to the function. If a valid move constructor exists, the compiler can choose to use it when the value received by Project() is an rvalue reference, like a return value from a function:

Name GetProjectName() { Name project_name; project_name.value = "einstein@home"; return project_name; // returns an rvalue reference to project_name } Project project(GetProjectName());

As shown above, the compiler needs to construct a Name object from the return value of GetProjectName() to pass to the Project constructor. Normally, return-value optimization kicks in, so an extra instantiation isn't needed, but if it doesn't, the next best choice is move construction.

This doesn't work for more common lvalues:

Name project_name = GetProjectName(); Project project(project_name); // copy!

Here, the return value is assigned to a variable, so it becomes an lvalue. If we try to pass that to the Project constructor, the compiler does a normal copy. Traditionally, it seems like C++ gets around this by using references:

Project::Project(const Name& project_name) : m_name(project_name) { ... }

By passing a reference to the constructor, the compiler doesn't add an immediate copy operation. However, because Project needs to take ownership of the value of project_name, we have to copy the value behind the reference into the m_name member field anyway.

It seems that C++11 provides the std::move() utility to work around these cases. The function converts an lvalue variable into an xvalue for functions that take rvalues to enable move construction (value...value...value...whew!).

Name project_name = GetProjectName(); Project project(std::move(project_name)); // project_name resources moved here

Our previous discussion above about Project's internal const fields illustrates how const does block moves by preventing the compiler from moving values into and out of the object's member fields. But for function parameters, the move occurs before the object binds to the const parameter.

The Whitelist data structures don't really benefit from move optimizations because their copy-on-write design requires exactly the opposite functionality, but I want to make sure I understand these concepts for any other code that I write. Similarly, moving doesn't work below--the compiler falls back to a copy of the const variable:

const Name project_name = GetProjectName(); Project project(std::move(project_name)); // copy!

CC: @jamescowens - would like your eyes on this too for fact checking 😅

@philipswift No claims of sanity were made here 🙂

I fail to find cases where passing by const value makes sense. By-value will create copy, no matter const, unless the function is inlined. But const pointer or const reference is good and useful.

In your last example, @cyrossignol , std::move turns & into &&. But it does not change const-ness. Since there is no ctor(const T&&), && decays to & and ctor(const T&) is used. Which is copy ctor.

Passing non-const value makes sense for values smaller than pointer and for values that will be moved inside the function. But move operation is not free either. Referencing is cheapest.
Impact of all this is reduced for code that is all in one translation unit. Because then the compiler may use (partial) inlining and do these optimizations for you.

Thanks, @tomasbrod -- I appreciate you taking the time to explain...this was one of the more confusing parts about learning C++ for me, and I'm finally starting to understand it.

During my testing, I was thrown-off by what appeared to be the compiler optimizing-away the copy for some (non-trivial) const by-value arguments, but I noticed it doesn't happen every time. Also, it seems as if moving into a const argument works just fine because there's often a compiler-generated move constructor. It's just for moving out of a const value that the compiler should perform a copy, but I think I observed that it doesn't always (maybe when it can avoid it).

I asked @denravonska about it on Slack:

All this makes sense to me so far, and I've settled on the belief that we can't use const if we want moves to work. There seems to be a conflict between const-correctness and move semantics. I'd like to know your opinion: is one more important than the other in the design of good C++ code? It makes sense to declare a parameter const if the function should not modify it, but we could cause an unwanted copy because the compiler really isn't supposed to move out of a const value. On the other hand, is it a bad form to omit const just to enable moves? Do you think there's a best practice?

...and he gave me similar advice: as a rule-of-thumb, non-const by-value, or const by-ref.

So, based on both of your suggestions, I'll write functions with:

non-const by-value parameters when the function needs to copy to take ownership anyway

const& parameters otherwise

mutable references or pointers only where strictly necessary

I've updated the code in this PR based on your recommendations. We can review in the next iteration.

denravonska · 2019-03-10T05:14:32Z

src/neuralnet/project.h

+//!
+//! \brief Smart pointer around a collection of projects.
+//!
+typedef std::shared_ptr<std::vector<Project>> ProjectListPtr;


It could be a good idea to also typedef std::vector<Project> ProjectList for readability later on.

I like that--will add to the next version.

Yep, similar to the kind of stuff I did in the scraper...

src/neuralnet/project.h

tomasbrod · 2019-03-14T09:27:31Z

src/neuralnet/neuralnet.cpp

+    const std::string& value,
+    const int64_t& timestamp)
+{
+    if (type == "project" || type == "projectmapping") {


what is projectmapping?

Looks like an old contract type name used before "project"... I didn't see any instances of that one on testnet... need to scan mainnet--if it doesn't exist there either, we can remove it.

Yep. Carried over from the old code.... probably vestigial.

tomasbrod · 2019-03-14T09:30:53Z

src/neuralnet/project.cpp

+        return Baseurl("") + "stats/";
+    }
+
+    return Baseurl("") + "stats/" + type + ".gz";


This is wrong assumption. Project stats urls are not required to adhere to .../stats/type.gz .

Agreed--until we change the format of URLs in the project contracts, I'm just doing it similarly to the way the old code does.

tomasbrod · 2019-03-14T09:38:10Z

src/neuralnet/project.h

+//! data with snapshots until the application mutates the data by adding or
+//! removing a project. During mutation, a \c Whitelist object copies its data
+//! and stores that behind a new smart pointer as an atomic operation, so any
+//! existing \c WhitelistSnapshot instances become obsolete.


What is the use case of mutable whitelist snapshot? Only admin message processing should modify whitelist. And that happens under mutex lock.

A snapshot is immutable. The comment about mutation applies only to the step that loads contracts from transaction messages. I'm trying to make it easier and faster to access the data by avoiding the lock on a mutex.

If it was me, I would KISS and just copy the structure to scraper when it needs to run.

Thanks @tomasbrod. I'm trying to KISS for the calling code that actually uses the project data. By abstracting the the details behind these classes, consumers don't need to worry about synchronization or efficient memory management. I'm working on beacon stuff that needs project data, and we could present it in other areas or features as well. The project data changes very infrequently, so I couldn't justify copying it every time something uses it 😄

tomasbrod · 2019-03-14T18:40:48Z

lock-free API

shared_ptr contain mutex, therefore it is not lock free

I appreciate this patch. It will be useful to have Project and Whitelist data structures for future changes to reward calculation. Not limited to the NN namespace.

jamescowens · 2019-03-14T19:19:39Z

Are you sure that is true Tomas? From https://en.cppreference.com/w/cpp/memory/shared_ptr

All member functions (including copy constructor and copy assignment) can be called by multiple threads on different instances of shared_ptr without additional synchronization even if these instances are copies and share ownership of the same object. If multiple threads of execution access the same shared_ptr without synchronization and any of those accesses uses a non-const member function of shared_ptr then a data race will occur; the shared_ptr overloads of atomic functions can be used to prevent the data race.

If their was an implicit mutex then why would all of those cautions be necessary, and why the overloads of atomic functions?

cyrossignol · 2019-03-14T19:41:23Z

@tomasbrod The information that I read while trying to figure out how to do this suggests that the implementation of shared_ptr uses atomics for reference counting when the target platform supports them--otherwise, it falls back to using a mutex. The data wrapped by the pointer requires its own manual synchronization if needed. I'm pretty new to C++, so maybe I misunderstood.

jamescowens · 2019-03-14T20:05:14Z

Anyway I agree with @tomasbrod this a hell of an improvement over the old code and will be a good basis for extension. @cyrossignol let's get this cleaned up so we can move towards a merge.

Start adding tests to the whitelist manager

…ests

cyrossignol · 2019-03-17T01:08:56Z

I think I've addressed all the reviews. Thanks to @denravonska, we have tests for these changes. I filled in the gaps there as requested.

I'll push the move to a contract handler interface after this. The scope is a little more involved, so it will go into a separate PR.

jamescowens · 2019-03-17T14:42:57Z

@cyrossignol I don't necessarily think the PR merge needs to wait on this, but listdata needs to be fixed up to use the new structure for keytype "project" so that listdata continues to work regardless of whether in the old appcache structure or new. This will have to be extended each time we move a section out of the appcache.

jamescowens · 2019-03-17T14:54:24Z

@cyrossignol, @denravonska and I discussed this and we have agreed that we actually should retire the listdata call as we peel off sections into their own structures. This means we should implement separate calls for each of the former "sections". You had a whitelist test rpc call that you removed that does this perfectly for this first section, project. I would recommend renaming it to getprojects. I think we should stick with the naming convention get where is the old section name.

Please do this and I will merge the PR.

@jamescowens

Added: - Replace NeuralNetwork with portable C++ scraper #1387 (@jamescowens, @tomasbrod, @Cycy, @TheCharlatan, @denravonska). - Allow compile flags to be used for depends #1423 (@G-UK). - Add stake splitting and side staking info to getmininginfo #1424 (@jamescowens). - Add freedesktop.org desktop file and icon set #1438 (@a123b). Changed: - Disable Qt for windows Travis builds #1276 (@TheCharlatan). - Replace use of AppCache PROJECT section with strongly-typed structures #1415 (@cyrossignol). - Change dumpwallet to use appropriate data directory #1416 (@jamescowens). - Optimize ExtractXML() calls by avoiding unnecessary string copies #1419 (@cyrossignol). - Change signature of IsLockTimeWithinMinutes #1422 (@jamescowens). - Restore old poll output for getmininginfo RPC #1437 (@a123b). - Prevent segfault when using rpc savescraperfilemanifest #1439 (@jamescowens). - Improve miner status messages for ineligible staking balances #1447 (@cyrossignol). - Enhance scraper log archiving #1449 (@jamescowens). Fixed: - Re-enable full GUI 32-bit Windows builds - part of #1387 (@jamescowens). - Re-activate Windows Installer #1409 (@TheCharlatan). - Fix Depends and Travis build issues for ARM #1417 (@jamescowens). - Fix syncupdate icons #1421 (@jamescowens). - Fix potential BOINC crash when reading projects #1426 (@cyrossignol). - Fix freeze when unlocking wallet #1428 (@denravonska). - Fix RPC after high priority alert #1432 (@denravonska). - Fix missing poll in GUI when most recent poll expired #1455 (@cyrossignol). Removed: - Remove old, rudimentary side staking implementation #1381 (@denravonska). - Remove auto unlock #1402 (@denravonska). - Remove superblock forwarding #1430 (@denravonska).

cyrossignol added 2 commits March 8, 2019 00:23

Centralize BOINC project whitelist logic and storage

9d9a976

Refactor to use strongly-typed NN::Whitelist API instead of PROJECT A…

c024aaf

…ppCache section

jamescowens requested review from jamescowens and denravonska March 9, 2019 15:11

jamescowens approved these changes Mar 9, 2019

View reviewed changes

jamescowens added this to the Denise milestone Mar 9, 2019

denravonska reviewed Mar 11, 2019

View reviewed changes

Start adding whitelist tests.

c2c4745

tomasbrod reviewed Mar 14, 2019

View reviewed changes

jamescowens assigned cyrossignol Mar 14, 2019

cyrossignol added 7 commits March 14, 2019 19:32

Merge pull request #1 from denravonska/cycy-beacon-tests

1d3d06e

Start adding tests to the whitelist manager

Move project whitelist globals into project.h and project.cpp

c63af12

Create ProjectList type alias for std::vector<Project>

c74ceb8

Eliminate the need for MutableProject when sorting a whitelist snapshot

d8d7f0a

Overwrite whitelist entries when adding a project with the same name

44fc825

Move project tests to neuralnet folder for later organization of NN t…

f186d5c

…ests

Finish writing tests for neural network project whitelist API

438391b

Remove the "whitelist" RPC method used for testing

e5fe2ac

denravonska approved these changes Mar 17, 2019

View reviewed changes

cyrossignol and others added 2 commits March 18, 2019 01:37

Add back project whitelist RPC method as "listprojects"

8d17497

Require addkey action scraper to use master key - Part II

199e0cb

jamescowens approved these changes Mar 18, 2019

View reviewed changes

jamescowens merged commit d407125 into gridcoin-community:development Mar 18, 2019

cyrossignol mentioned this pull request May 1, 2020

Overhaul the contract handling system #1669

Merged

Replace use of AppCache PROJECT section with strongly-typed structures #1415

Replace use of AppCache PROJECT section with strongly-typed structures #1415

Uh oh!

Conversation

cyrossignol commented Mar 9, 2019

Uh oh!

jamescowens commented Mar 9, 2019

Uh oh!

jamescowens left a comment

Choose a reason for hiding this comment

Uh oh!

jamescowens commented Mar 9, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cyrossignol Mar 11, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cyrossignol Mar 11, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cyrossignol Mar 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tomasbrod Mar 18, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cyrossignol Mar 18, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tomasbrod Mar 14, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cyrossignol Mar 14, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

cyrossignol Mar 11, 2019 •

edited

Loading

cyrossignol Mar 11, 2019 •

edited

Loading

cyrossignol Mar 12, 2019 •

edited

Loading

tomasbrod Mar 18, 2019 •

edited

Loading

cyrossignol Mar 18, 2019 •

edited

Loading

tomasbrod Mar 14, 2019 •

edited

Loading

cyrossignol Mar 14, 2019 •

edited

Loading

tomasbrod commented Mar 14, 2019 •

edited

Loading

jamescowens commented Mar 17, 2019 •

edited

Loading

jamescowens commented Mar 17, 2019 •

edited

Loading