Skip to content

Add support for merge keys.#300

Closed
Ortham wants to merge 2 commits intojbeder:masterfrom
loot:merge-key-support
Closed

Add support for merge keys.#300
Ortham wants to merge 2 commits intojbeder:masterfrom
loot:merge-key-support

Conversation

@Ortham
Copy link
Copy Markdown
Contributor

@Ortham Ortham commented Apr 4, 2015

This would close issue #41. While the merge key type isn't mentioned in the YAML 1.2 spec, there's a spec for YAML 1.1, and I know I'm not alone in finding it a very useful feature.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add curly braces.

@jbeder
Copy link
Copy Markdown
Owner

jbeder commented Apr 8, 2015

In addition to the nits on the CL, I have a couple high-level questions:

  1. Why implement this at the YAML::Node level rather than the YAML::detail::node level? Couldn't node::get() look up the merge too? (honest question - I haven't tried, so I'm not sure of the answer)

  2. Right now, if you look up node["<<"], it'll give you that sequence or map; is that what we want? Or should that be undefined since we've "used" the value for merges.

  3. I really worry about this giving unexpected results to people, especially since it's not in the YAML 1.2 spec. If you don't know about merges, this is very strange behavior. I'd love to be able to turn them on or off (default off, I think), but I'm not sure how best to do that. What do you think?

@Ortham
Copy link
Copy Markdown
Contributor Author

Ortham commented Apr 8, 2015

  1. Why implement this at the YAML::Node level rather than the YAML::detail::node level? Couldn't node::get() look up the merge too?

TBH I haven't looked at YAML::detail::node. This was just the most straightforward way to implement a feature I wanted. If you feel that putting it in YAML::detail::node would be better design, I can take a look at it.

  1. Right now, if you look up node["<<"], it'll give you that sequence or map; is that what we want? Or should that be undefined since we've "used" the value for merges.

I think there are probably situations where being able to access the merge node itself would be beneficial, eg. during document conversion, where you might want to change what gets merged into a node before emitting the YAML.

  1. I really worry about this giving unexpected results to people, especially since it's not in the YAML 1.2 spec. If you don't know about merges, this is very strange behavior. I'd love to be able to turn them on or off (default off, I think), but I'm not sure how best to do that. What do you think?

Wrap the code in a preprocessor definition, eg. YAML_CPP_SUPPORT_MERGE_KEYS? I can't think of any use-case where having a runtime switch would be beneficial. I'd be perfectly happy with a default-off switch though.

@jbeder
Copy link
Copy Markdown
Owner

jbeder commented Apr 8, 2015

I suspect it'd be better to put the implementation in YAML::detail::node - the purpose of YAML::Node is just for wrapping the actual work in an API.

The problem with a preprocessor definition is that it's going in the headers, so we have to make sure people set it when they install the library. Maybe a CMake flag that generates the definition? That seems super hacky...

@Ortham
Copy link
Copy Markdown
Contributor Author

Ortham commented Apr 8, 2015

I suspect it'd be better to put the implementation in YAML::detail::node - the purpose of YAML::Node is just for wrapping the actual work in an API.

Okay, I'll do it then.

The problem with a preprocessor definition is that it's going in the headers, so we have to make sure people set it when they install the library. Maybe a CMake flag that generates the definition? That seems super hacky...

Edited, disregard what I previously wrote. I'm not sure there is a cleaner way of doing things.

@jbrezmorf
Copy link
Copy Markdown

Can not build since to_value wanted in GetValueFromMergeKey is defined in node/impl.h.
I have used explicit conversion to std::string in my fork, and it works.
Looking forward to have this feature in official master.

@Ortham
Copy link
Copy Markdown
Contributor Author

Ortham commented Aug 17, 2015

Can not build since to_value wanted in GetValueFromMergeKey is defined in node/impl.h.

Works for me, but I suppose it's bad design anyway. I've replaced to_value with std::string too now.

@Ortham
Copy link
Copy Markdown
Contributor Author

Ortham commented Mar 26, 2017

@jbeder: I've just rebased this PR on top of the latest master. Is there anything I need to address to get this merged?

@jbeder
Copy link
Copy Markdown
Owner

jbeder commented Mar 26, 2017

Did we ever work out a good API to make sure that this doesn't confuse people (and can be turned on/off)?

@Ortham
Copy link
Copy Markdown
Contributor Author

Ortham commented Mar 26, 2017

No, but I'm still partial to using a YAML_CPP_SUPPORT_MERGE_KEYS preprocessor definition.

E.g. in the yaml-cpp code it would be

#ifdef YAML_CPP_SUPPORT_MERGE_KEYS
    if (value.type() == NodeType::Undefined) {
       return *GetValueFromMergeKey(key, &value, pMemory);
    }
#endif

Somewhere in yaml-cpp's CMakeLists.txt there would be

option(YAML_CPP_SUPPORT_MERGE_KEYS "Build with support for YAML merge keys ('<<')" OFF)

IF (YAML_CPP_SUPPORT_MERGE_KEYS)
  add_definitions(-DYAML_CPP_SUPPORT_MERGE_KEYS)
ENDIF()

If I wanted to include merge key support I'd build it with cmake ... -DYAML_CPP_SUPPORT_MERGE_KEYS=ON, and include yaml-cpp as

#define YAML_CPP_SUPPORT_MERGE_KEYS
#include <yaml-cpp/yaml.h>

That way users have to explicitly opt-in to the behaviour, so there shouldn't be any nasty surprises.

@Ortham
Copy link
Copy Markdown
Contributor Author

Ortham commented Mar 27, 2017

I've committed my proposal for turning on/off merge key support.

@joseph-ireland
Copy link
Copy Markdown

joseph-ireland commented Apr 3, 2017

I disagree that this feature would be unexpected. Every major implementation does this with no easy option to disable: pyyaml, ruby stdlib, java snakeyaml, etc. Also it is mentioned in common introductory material, including the YAML wikipedia page.

Also, it is (almost) part of the 1.2 spec*:

10.4. Other Schemas
...
In addition, it is strongly recommended that such schemas make as much use as possible of the the YAML tag repository at http://yaml.org/type/. This repository provides recommended global tags for increasing the portability of YAML documents between different applications.

which links to a list of recommended interoperable schemas to support, which includes this merge feature.

*The interoperable schemas are intentionally left out of the official spec to allow them to evolve independently.

@Ortham
Copy link
Copy Markdown
Contributor Author

Ortham commented Apr 26, 2017

@jbeder How's my implementation of the support switch? Does @joseph-ireland's spec find (which I'd missed) change your stance on whether the support should be default on, default off or always on? What's (if anything) blocking this getting merged?

@jbrezmorf
Copy link
Copy Markdown

Strong support to merge this. Currently, we use the old version of "the merge support" and looking forward to the merge into master.

@joseph-ireland
Copy link
Copy Markdown

joseph-ireland commented Apr 27, 2017

There is an issue with editing documents in this PR. If you try to assign node["key"]=x, and there is a default value merged in with <<: *DEFAULTS, then it will assign into the defaults, and potentially change lots of other merged maps too.

I think a better approach would be to implement it at the NodeBuilder level, and do a single-level copy (this is what pyyaml does), i.e. OnMapEnd, if (node["<<"].IsMap()), loop over its keys and insert them into the node if not overriden, then remove the "<<". This will eliminate the *DEFAULTS ref in the above case, but not any nested references, acting consistently with other implementations

@Ortham
Copy link
Copy Markdown
Contributor Author

Ortham commented Apr 27, 2017

There is an issue with editing documents in this PR. If you try to assign node["key"]=x, and there is a default value merged in with <<: *DEFAULTS, then it will assign into the defaults, and potentially change lots of other merged maps too.

I'm not sure I understand, what would that look like (eg. as a test)?

I don't think I'm ever in a position where I edit a Node that may contain merged keys, so I guess I missed something...

@joseph-ireland
Copy link
Copy Markdown

joseph-ireland commented Apr 28, 2017

@WrinklyNinja I'd expect something like this to fail:

TEST(MergeKey, CheckAssignmentDoesntOverwriteDefault) {
    auto node = YAML::Load(R"(
defaults : &DEFAULTS
    merged: default
a:
    <<: *DEFAULTS
b:
    <<: *DEFAULTS
)");

    node["a"]["merged"] = "updated" ;
    // The node returned & assigned to will be node["defaults"]["merged"]
    // So I expect this will fail with node["b"]["merged"] == "updated"
    EXPECT_EQ(node["b"]["merged"].Scalar(), "default"); 
}

@Ortham
Copy link
Copy Markdown
Contributor Author

Ortham commented Apr 28, 2017

Ah, I understand now. I'll see about changing my implementation so it passes that test.

@Ortham
Copy link
Copy Markdown
Contributor Author

Ortham commented May 12, 2017

So I tried a naive reimplementation in NodeBuilder:

void NodeBuilder::OnMapEnd() {
  detail::node& node = *m_stack.back();
  const std::string mergeKey = "<<";
  detail::node& mergeValue = node.get(mergeKey, m_pMemory);

  if (mergeValue.type() == NodeType::Sequence) {
    for (detail::node_iterator sit = mergeValue.begin();
         sit != mergeValue.end(); ++sit) {
      InsertAll(*sit->pNode, node);
    }
  } else {
    InsertAll(mergeValue, node);
  }

  node.remove(mergeKey, m_pMemory);

  assert(m_mapDepth > 0);
  m_mapDepth--;
  Pop();
}

void NodeBuilder::InsertAll(detail::node& sourceNode, detail::node& destinationNode) {
  if (sourceNode.type() == NodeType::Map) {
    for (detail::node_iterator it = sourceNode.begin();
         it != sourceNode.end(); ++it) {
      destinationNode.insert(*it->first, *it->second, m_pMemory);
    }
  }
}

However, this fails because it's inserting references to existing nodes, as documented for the higher-level API. I had a look through the code but couldn't find any way to create a copy of a node (which would have to be recursive for protection against overwriting merged map values).

I may be missing something, but it seems like editing nodes is broken by design in general, and not just for merged nodes. Merging nodes just makes it easier to hit that brokenness, which I think is way beyond the scope of this PR to fix.

@Ortham Ortham force-pushed the merge-key-support branch 2 times, most recently from 724d481 to a66cf12 Compare August 23, 2018 18:45
Merge keys are specified here[1] for YAML 1.1. While not part of
the YAML 1.2 specification, they're very useful and are supported
in other implementations[2][3][4] that target 1.2.

Support for merge keys is optional and disabled by default. It can
be enabled by defining YAML_CPP_SUPPORT_MERGE_KEYS, either directly
or by setting the CMake option YAML_CPP_SUPPORT_MERGE_KEYS=ON.

[1]: http://yaml.org/type/merge.html
[2]: https://github.com/go-yaml/yaml
[3]: https://github.com/ruby/psych
[4]: https://bitbucket.org/ruamel/yaml
@Ortham Ortham force-pushed the merge-key-support branch from a66cf12 to 5d2dfe8 Compare August 23, 2018 18:53
@Ortham
Copy link
Copy Markdown
Contributor Author

Ortham commented Aug 23, 2018

@jbeder I've rebased this PR off a recent master commit, is there anything blocking it from being merged?

@hofbi
Copy link
Copy Markdown

hofbi commented Sep 3, 2018

Could you also add this as a Unit Test. This is related to issue #617 and I'm not sure if this will also work.

TEST(NodeTest, AnchorAndMergeKey) {
  Node node = Load(R"(
    a_root: &root_anchor
      key1: value1
      key2: value2
    b_child:
      <<: *root_anchor
      key2: value2_override
  )");

  ASSERT_FALSE(!node["a_root"]);
  ASSERT_FALSE(!node["b_child"]);
  EXPECT_EQ(value1, node["a_root"]["key1"].as<std::string>());
  EXPECT_EQ(value2, node["a_root"]["key2"].as<std::string>());
  EXPECT_EQ(value1, node["b_child"]["key1"].as<std::string>());
  EXPECT_EQ(value2_override, node["b_child"]["key2"].as<std::string>());
}

@hofbi
Copy link
Copy Markdown

hofbi commented Sep 4, 2018

Looks goot to me.

I just saw that PR #567 will also add such kind of test, so this PR should also fix the test of #567

@hofbi
Copy link
Copy Markdown

hofbi commented Sep 7, 2018

@WrinklyNinja Could you rebase and squash again that we can merge this

@Ortham
Copy link
Copy Markdown
Contributor Author

Ortham commented Sep 7, 2018

@hofbi Who is 'we'? You don't appear to have any authority to merge this (you're not the owner and aren't shown as a collaborator).

I added the previous test thinking that you were asking as a maintainer, I'm unwilling to act in the interests of an unknown party without knowing how it'll affect the chances of this PR finally getting merged. For all I know, @jbeder wouldn't want the test I added for you, or want this squashed.

@hofbi
Copy link
Copy Markdown

hofbi commented Sep 8, 2018

@WrinklyNinja

  1. Where does @jbeder say he doesn't want this test. The suggestion for this test was the outcome of issue Anchor/Alias crashs on accessing not overridden key/value #617.
  2. How should a Unit Test affect the outcome of this pull request. The scope of your PR is to support merge key. The test is about valid yaml with usage of merge key. Either you see the test passing which means this kind of test works or the test fails. In this case you know there is something which works not as expected.

@hofbi
Copy link
Copy Markdown

hofbi commented Oct 8, 2018

@jbeder Could you review the changes again please?

@jbeder
Copy link
Copy Markdown
Owner

jbeder commented Oct 8, 2018 via email

@Ortham
Copy link
Copy Markdown
Contributor Author

Ortham commented Oct 9, 2018

I'm still not sure if we want this. It's not in the official spec, and I don't like the idea of a preprocessor config.

As joseph-ireland noted, I was wrong in my initial PR message, as the spec does strongly recommend using merge keys, along with other types not included in the spec itself. Quoting the relevant section:

Interoperable schemas make use of global tags (URIs) that represent the same data across different programming languages. In addition, an interoperable schema may provide additional tag resolution rules. Such rules may provide additional regular expressions, as well as consider the path to the node. This allows interoperable schemas to use untagged nodes.

It is strongly recommended that such schemas be based on the core schema defined above. In addition, it is strongly recommended that such schemas make as much use as possible of the the YAML tag repository at http://yaml.org/type/. This repository provides recommended global tags for increasing the portability of YAML documents between different applications.

The tag repository is intentionally left out of the scope of this specification. This allows it to evolve to better support YAML applications.

Not supporting merge keys because they're not in the spec document itself seems to directly contradict the text.

@hofbi
Copy link
Copy Markdown

hofbi commented Oct 11, 2018

Also check node anchors of the spec http://yaml.org/spec/1.2/spec.html#id2785586

@timn
Copy link
Copy Markdown

timn commented Dec 5, 2018

We would like to use this to avoid repetitions of defaults in configurations, which can be rather annoying. I would say the cat is already out of the bag regarding merge keys, they are used heavily, e.g., for CI tool configuration (cf. CircleCI blog entry), and it's mentioned in tutorials. We also have an example in our Buildkite pipeline configuration, where we would need to repeat this snippet several times. Now, we would like to have the same capabilities in our YAML-based configuration implemented using yaml-cpp.

I think it would be good for yaml-cpp to merge this for compatibility with the rest what's out there in YAML-land.

@noizex
Copy link
Copy Markdown

noizex commented Jan 1, 2020

I'm still not sure if we want this. It's not in the official spec, and I don't like the idea of a preprocessor config.

Can't this be supported with a toggle? If someone wants "pure" spec-compliant version they can compile it with such flag, and those who find merge functionality to be one of the best things about YAML file composability and removing duplication could use the feature with yaml-cpp, rather than try to find some other way to do this. As far as I know this is widely supported by libs from other languages too?

@wieset
Copy link
Copy Markdown

wieset commented Jan 24, 2020

I was surprised today to find that yaml-cpp does not support merge keys. Super useful to avoid repetitions. Very much in favor of seeing this implemented!

arntanguy added a commit to arntanguy/lipm_walking_controller that referenced this pull request Mar 12, 2021
Since yaml-cpp does not support, nor intend to support YAML merge
anchors, this manually adds this ability to inherit from another robot's
plans configuration to avoid duplication

See jbeder/yaml-cpp#300
@andrewerf
Copy link
Copy Markdown

It's 2021 now. Any plans on adding this feature?

@perrysteger
Copy link
Copy Markdown

Would very much appreciate merge keys being supported in yaml-cpp. Took me a while to realize I was doing nothing wrong when I had a node key of "<<"

@jbeder
Copy link
Copy Markdown
Owner

jbeder commented Jul 10, 2021

I think the place this PR ended up is that I don't think it should be in the mainline of yaml-cpp because it's not part of the standard YAML 1.2 spec. I'm going to close this PR so that's clear.

@jbeder jbeder closed this Jul 10, 2021
@noizex
Copy link
Copy Markdown

noizex commented Jul 10, 2021

Well you not thinking it should be in the mainline was the start of this years long "conversation" where people tried to convince you that having this as a switchable feature would be beneficial for the library. At least that gives us official stance on this feature and maybe some incentive to start a fork.

@jbeder
Copy link
Copy Markdown
Owner

jbeder commented Jul 10, 2021

Yeah, that's totally fine. This probably would be a good feature on top of the library (like a call to Merge or something). I'm also looking for someone to take over yaml-cpp, so if that future person wants to merge this feature, that's fine too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants